Distances

Distance between vectors

Euclidean distance: Euclidean distance is a measure of the true straight line distance between two points in Euclidean space. (source )


The Euclidean distance between vectors

404 image

and

404 image

is defined as:


404 image

cities_df = pd.read_csv("data/canada_usa_cities.csv")
train_df, test_df = train_test_split(cities_df, test_size=0.2, random_state=123)
train_df.head()
longitude latitude country
160 -76.4813 44.2307 Canada
127 -81.2496 42.9837 Canada
169 -66.0580 45.2788 Canada
188 -73.2533 45.3057 Canada
187 -67.9245 47.1652 Canada
cities_viz = alt.Chart(train_df, width=500, height=300).mark_circle(size=20, opacity=0.6).encode(
    alt.X('longitude:Q', scale=alt.Scale(domain=[-140, -40])),
    alt.Y('latitude:Q', scale=alt.Scale(domain=[20, 60])),
    alt.Color('country:N', scale=alt.Scale(domain=['Canada', 'USA'],
                                           range=['red', 'blue']))
)
cities_viz
404 image
two_cities = cities_df.sample(2, random_state=42).drop(columns=["country"])
two_cities
longitude latitude
30 -66.9843 44.8607
171 -80.2632 43.1408


404 image

How do we calculate the distance between the two cities?

two_cities
longitude latitude
30 -66.9843 44.8607
171 -80.2632 43.1408


Subtract the two cities:

two_cities.iloc[1] - two_cities.iloc[0]
longitude   -13.2789
latitude     -1.7199
dtype: float64


Square the differences:

(two_cities.iloc[1] - two_cities.iloc[0])**2
longitude    176.329185
latitude       2.958056
dtype: float64

Sum them up:

((two_cities.iloc[1] - two_cities.iloc[0])**2).sum()
np.float64(179.28724121999983)


And then take the square root:

np.sqrt(np.sum((two_cities.iloc[1] - two_cities.iloc[0])**2))
np.float64(13.389818565611703)
np.sqrt(np.sum((two_cities.iloc[1] - two_cities.iloc[0])**2))
np.float64(13.389818565611703)


from sklearn.metrics.pairwise import euclidean_distances

euclidean_distances(two_cities)
array([[ 0.        , 13.38981857],
       [13.38981857,  0.        ]])

Let’s apply what we learned!