Terminology with analogy-based models

Analogy-based models


Attribution

Analogy-based algorithms in practice

404 image

  • Recommendation systems

404 image

Geometric view of tabular data and dimensions

404 image
cities_df = pd.read_csv("data/canada_usa_cities.csv")
train_df, test_df = train_test_split(cities_df, test_size=0.2, random_state=123)
train_df.head()
longitude latitude country
160 -76.4813 44.2307 Canada
127 -81.2496 42.9837 Canada
169 -66.0580 45.2788 Canada
188 -73.2533 45.3057 Canada
187 -67.9245 47.1652 Canada
cities_plot = alt.Chart(train_df).mark_circle(size=20, opacity=0.6).encode(
    alt.X('longitude:Q', scale=alt.Scale(domain=[-140, -40])),
    alt.Y('latitude:Q', scale=alt.Scale(domain=[20, 60])),
    alt.Color('country:N', scale=alt.Scale(domain=['Canada', 'USA'],
                                           range=['red', 'blue']))
)
cities_plot
404 image

Dimensions

grades_df = pd.read_csv("data/quiz2-grade-toy-classification.csv")
grades_df.head()
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1 quiz2
0 1 1 92 93 84 91 92 A+
1 1 0 94 90 80 83 91 not A+
2 0 0 78 85 83 80 80 not A+
3 0 1 91 94 92 91 89 A+
4 0 1 77 83 90 92 85 A+


X = grades_df.drop(columns=['quiz2'])
X.shape[1]
7

Dimensions in ML problems

Dimensions:

  • Dimensions≈20: Low dimensional
  • Dimensions≈1000: Medium dimensional
  • Dimensions≈100,000: High dimensional

Feature vectors

Feature vector: a vector composed of feature values associated with an example.

train_df.head()
longitude latitude country
160 -76.4813 44.2307 Canada
127 -81.2496 42.9837 Canada
169 -66.0580 45.2788 Canada
188 -73.2533 45.3057 Canada
187 -67.9245 47.1652 Canada


An example feature vector from the cities dataset:

train_df.drop(columns=["country"]).iloc[0].round(2).to_numpy()
array([-76.48,  44.23])


An example feature vector from the grading dataset:

grades_df.drop(columns=['quiz2']).iloc[0].round(2).to_numpy()
array([ 1,  1, 92, 93, 84, 91, 92])

Let’s apply what we learned!