Lecture 07: Collaborative filtering class demo

Lecture 07: Collaborative filtering class demo#

import os
import random
import sys

import numpy as np
import pandas as pd

sys.path.append(os.path.join(os.path.abspath(".."), "code"))

Example dataset: Jester 1.7M jokes ratings dataset #

We’ll use a sample of Jester 1.7M jokes ratings dataset to demonstrate different recommendation systems.

The dataset comes with two CSVs

A CSV containing ratings (-10.0 to +10.0) of 150 jokes from 59,132 users.
A CSV containing joke IDs and the actual text of jokes.

Some jokes might be offensive. Please do not look too much into the actual text data if you are sensitive to such language.

Recommendation systems are most effective when you have a large amount of data.
But we are only taking a sample here for speed.

filename = "../data/jester_ratings.csv"
ratings_full = pd.read_csv(filename)
ratings = ratings_full[ratings_full["userId"] <= 4000]

ratings.head()

	userId	jokeId	rating
0	1	5	0.219
1	1	7	-9.281
2	1	8	-9.281
3	1	13	-6.781
4	1	15	0.875

user_key = "userId"
item_key = "jokeId"

Dataset stats#

ratings.info()

<class 'pandas.core.frame.DataFrame'>
Index: 141362 entries, 0 to 141361
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   userId  141362 non-null  int64  
 1   jokeId  141362 non-null  int64  
 2   rating  141362 non-null  float64
dtypes: float64(1), int64(2)
memory usage: 4.3 MB

def get_stats(ratings, item_key="jokeId", user_key="userId"):
    print("Number of ratings:", len(ratings))
    print("Average rating:  %0.3f" % (np.mean(ratings["rating"])))
    N = len(np.unique(ratings[user_key]))
    M = len(np.unique(ratings[item_key]))
    print("Number of users (N): %d" % N)
    print("Number of items (M): %d" % M)
    print("Fraction non-nan ratings: %0.3f" % (len(ratings) / (N * M)))
    return N, M


N, M = get_stats(ratings)

Number of ratings: 141362
Average rating:  1.200
Number of users (N): 3635
Number of items (M): 140
Fraction non-nan ratings: 0.278

Let’s construct utility matrix with number of users rows and number of items columns from the ratings data.

Note we are constructing a non-sparse matrix for demonstration purpose here. In real life it’s recommended that you work with sparse matrices.

user_mapper = dict(zip(np.unique(ratings[user_key]), list(range(N))))
item_mapper = dict(zip(np.unique(ratings[item_key]), list(range(M))))
user_inverse_mapper = dict(zip(list(range(N)), np.unique(ratings[user_key])))
item_inverse_mapper = dict(zip(list(range(M)), np.unique(ratings[item_key])))

def create_Y_from_ratings(
    data, N, M, user_mapper, item_mapper, user_key="userId", item_key="jokeId"
):  # Function to create a dense utility matrix
    Y = np.zeros((N, M))
    Y.fill(np.nan)
    for index, val in data.iterrows():
        n = user_mapper[val[user_key]]
        m = item_mapper[val[item_key]]
        Y[n, m] = val["rating"]

    return Y

Utility matrix for Jester jokes ratings data#

Rows represent users.
Columns represent items (jokes in our case).
Each cell gives the rating given by the user to the corresponding joke.
Users are features for jokes and jokes are features for users.
We want to predict the missing entries.

Y_mat = create_Y_from_ratings(ratings, N, M, user_mapper, item_mapper)
Y_mat.shape

(3635, 140)

pd.DataFrame(Y_mat)

	0	1	2	3	4	5	6	7	8	9	...	130	131	132	133	134	135	136	137	138	139
0	0.219	-9.281	-9.281	-6.781	0.875	-9.656	-9.031	-7.469	-8.719	-9.156	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	-9.688	9.938	9.531	9.938	0.406	3.719	9.656	-2.688	-9.562	-9.125	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	-9.844	-9.844	-7.219	-2.031	-9.938	-9.969	-9.875	-9.812	-9.781	-6.844	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	-5.812	-4.500	-4.906	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	6.906	4.750	-5.906	-0.406	-4.031	3.875	6.219	5.656	6.094	5.406	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3630	NaN	-9.812	-0.062	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3631	NaN	-9.844	7.531	-9.719	-9.344	3.875	9.812	8.938	8.375	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3632	NaN	-1.906	3.969	-2.312	-0.344	-8.844	4.188	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3633	NaN	-8.875	-9.156	-9.156	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3634	NaN	-6.312	1.281	-3.531	2.125	-5.812	5.562	-6.062	0.125	NaN	...	NaN	NaN	4.188	NaN	NaN	NaN	NaN	NaN	NaN	NaN

3635 rows × 140 columns

Data splitting and evaluation#

Recall that our goal is to predict missing entries in the utility matrix.

pd.DataFrame(Y_mat)

	0	1	2	3	4	5	6	7	8	9	...	130	131	132	133	134	135	136	137	138	139
0	0.219	-9.281	-9.281	-6.781	0.875	-9.656	-9.031	-7.469	-8.719	-9.156	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	-9.688	9.938	9.531	9.938	0.406	3.719	9.656	-2.688	-9.562	-9.125	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	-9.844	-9.844	-7.219	-2.031	-9.938	-9.969	-9.875	-9.812	-9.781	-6.844	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	-5.812	-4.500	-4.906	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	6.906	4.750	-5.906	-0.406	-4.031	3.875	6.219	5.656	6.094	5.406	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3630	NaN	-9.812	-0.062	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3631	NaN	-9.844	7.531	-9.719	-9.344	3.875	9.812	8.938	8.375	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3632	NaN	-1.906	3.969	-2.312	-0.344	-8.844	4.188	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3633	NaN	-8.875	-9.156	-9.156	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3634	NaN	-6.312	1.281	-3.531	2.125	-5.812	5.562	-6.062	0.125	NaN	...	NaN	NaN	4.188	NaN	NaN	NaN	NaN	NaN	NaN	NaN

3635 rows × 140 columns

Data splitting#

We split the ratings into train and validation sets.
It’s easier to split the ratings data instead of splitting the utility matrix.
Don’t worry about y; we’re not really going to use it.

from sklearn.model_selection import train_test_split
X = ratings.copy()
y = ratings[user_key]
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_valid.shape

((113089, 3), (28273, 3))

Now we will create utility matrices for train and validation splits.

train_mat = create_Y_from_ratings(X_train, N, M, user_mapper, item_mapper)
valid_mat = create_Y_from_ratings(X_valid, N, M, user_mapper, item_mapper)

train_mat.shape, valid_mat.shape

((3635, 140), (3635, 140))

(len(X_train) / (N * M)) # Fraction of non-nan entries in the train set

0.22222244055806642

(len(X_valid) / (N * M)) # Fraction of non-nan entries in the valid set

0.055557083906464924

train_mat has only ratings from the train set and valid_mat has only ratings from the valid set.
During training we assume that we do not have access to some of the available ratings. We predict these ratings and evaluate them against ratings in the validation set.

def error(X1, X2):
    """
    Returns the root mean squared error.
    """
    return np.sqrt(np.nanmean((X1 - X2) ** 2))


def evaluate(pred_X, train_X, valid_X, model_name="Global average"):
    print("%s train RMSE: %0.2f" % (model_name, error(pred_X, train_X)))
    print("%s valid RMSE: %0.2f" % (model_name, error(pred_X, valid_X)))

Baseline approaches#

Let’s first try some simple approaches to predict missing entries.

Global average baseline
Per-user average baseline
Per-item average baseline
Average of 2 and 3
- Take an average of per-user and per-item averages.
\(k\)-Nearest Neighbours imputation

I’ll show you 1. and 5. You’ll explore 2., 3., and 4. in the lab.

Global average baseline#

Let’s examine RMSE of the global average baseline.
In this baseline we predict everything as the global average rating.

avg = np.nanmean(train_mat)
pred_g = np.zeros(train_mat.shape) + avg
pd.DataFrame(pred_g).head()

	0	1	2	3	4	5	6	7	8	9	...	130	131	132	133	134	135	136	137	138	139
0	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	...	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741
1	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	...	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741
2	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	...	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741
3	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	...	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741
4	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	...	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741	1.20741

5 rows × 140 columns

evaluate(pred_g, train_mat, valid_mat, model_name="Global average")

Global average train RMSE: 5.75
Global average valid RMSE: 5.77

\(k\)-nearest neighbours imputation #

from sklearn.impute import KNNImputer

imputer = KNNImputer(n_neighbors=10, keep_empty_features=True)
train_mat_imp = imputer.fit_transform(train_mat)

pd.DataFrame(train_mat_imp)

	0	1	2	3	4	5	6	7	8	9	...	130	131	132	133	134	135	136	137	138	139
0	-5.9406	-9.2810	-9.2810	-6.7810	0.8750	-9.6560	-9.0310	-7.4690	-8.7190	-9.1560	...	-4.5311	1.8968	0.6905	-3.1218	1.2843	-2.6063	-0.1812	-1.3937	1.7625	-0.4092
1	2.3405	9.9380	9.5310	9.9380	0.4060	3.7190	9.6560	-2.6880	4.3438	-9.1250	...	2.2437	3.1719	5.0251	5.1812	8.2407	5.9311	5.8375	6.3812	1.1687	6.2532
2	-9.8440	-3.5750	-7.2190	-2.0310	-9.9380	-9.9690	-9.8750	-9.8120	-9.7810	-6.8440	...	-4.4186	-3.1156	-1.5655	-5.6250	0.3720	-4.0439	-6.0500	-5.5563	-5.4125	-5.5874
3	-5.8120	-2.4624	-4.9060	-2.7781	-0.0532	-3.8594	1.7031	-0.3687	1.8469	0.0593	...	-2.0344	2.1469	2.8875	1.6845	1.2437	-0.0156	1.2595	3.8219	3.1971	5.0249
4	1.3157	4.7500	1.8658	-0.4060	1.7937	3.8750	6.2190	1.9220	6.0940	5.4060	...	-0.2844	1.1313	4.0157	3.0344	4.0406	0.5218	4.3594	4.0968	3.9250	3.9657
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3630	-0.7750	-9.8120	-0.0620	-2.8218	-4.1470	-4.8281	2.2718	-2.8782	-1.0125	0.0688	...	-6.6844	3.0531	2.8687	1.5281	4.5002	-0.1878	2.0031	4.0908	2.3563	5.0406
3631	2.5188	-5.0625	-0.4001	-9.7190	-9.3440	-1.6408	-4.1187	8.9380	8.3750	-0.9314	...	-4.0344	7.9155	3.4282	4.2968	6.7968	7.3999	1.8500	5.8219	5.1812	2.8437
3632	0.1749	-1.9060	3.9690	-1.3844	-0.3440	-8.8440	4.1880	-1.5564	5.0593	0.3343	...	-4.0126	2.8344	2.4499	2.9312	2.3750	-0.4062	1.4375	3.9750	-1.2220	2.8375
3633	-4.5937	-6.4376	-5.9563	-9.1560	-7.1437	-5.5844	2.2531	-0.9688	-2.8530	-0.6406	...	-4.6938	3.4186	5.1656	-0.1626	2.5594	-0.7750	4.6781	1.2658	-1.1718	-0.7157
3634	-0.0812	-6.3120	1.2810	-3.5310	2.1250	-5.8120	5.5620	0.2218	0.1250	-1.1874	...	-3.8156	4.1812	4.1880	3.7280	3.0750	2.1033	2.8156	5.5312	3.8283	4.1219

3635 rows × 140 columns

evaluate(train_mat_imp, train_mat, valid_mat, model_name="KNN imputer")

KNN imputer train RMSE: 0.00
KNN imputer valid RMSE: 4.79

(Optional) Finding nearest neighbors #

We can look at nearest neighbours of a query item.
Here our columns are jokes, and users are features for jokes, and we’ll have to find nearest neighbours of columns vectors.

pd.DataFrame(train_mat_imp).head()

	0	1	2	3	4	5	6	7	8	9	...	130	131	132	133	134	135	136	137	138	139
0	-5.9406	-9.2810	-9.2810	-6.7810	0.8750	-9.6560	-9.0310	-7.4690	-8.7190	-9.1560	...	-4.5311	1.8968	0.6905	-3.1218	1.2843	-2.6063	-0.1812	-1.3937	1.7625	-0.4092
1	2.3405	9.9380	9.5310	9.9380	0.4060	3.7190	9.6560	-2.6880	4.3438	-9.1250	...	2.2437	3.1719	5.0251	5.1812	8.2407	5.9311	5.8375	6.3812	1.1687	6.2532
2	-9.8440	-3.5750	-7.2190	-2.0310	-9.9380	-9.9690	-9.8750	-9.8120	-9.7810	-6.8440	...	-4.4186	-3.1156	-1.5655	-5.6250	0.3720	-4.0439	-6.0500	-5.5563	-5.4125	-5.5874
3	-5.8120	-2.4624	-4.9060	-2.7781	-0.0532	-3.8594	1.7031	-0.3687	1.8469	0.0593	...	-2.0344	2.1469	2.8875	1.6845	1.2437	-0.0156	1.2595	3.8219	3.1971	5.0249
4	1.3157	4.7500	1.8658	-0.4060	1.7937	3.8750	6.2190	1.9220	6.0940	5.4060	...	-0.2844	1.1313	4.0157	3.0344	4.0406	0.5218	4.3594	4.0968	3.9250	3.9657

5 rows × 140 columns

(Optional) \(k\)-nearest neighbours on a query joke#

Let’s transpose the matrix.

item_user_mat = train_mat_imp.T

jokes_df = pd.read_csv("../data/jester_items.csv")
jokes_df.head()

jester_items_df = pd.read_csv("../data/jester_items.csv")
jester_items_df.head()

id_joke_map = dict(zip(jokes_df.jokeId, jokes_df.jokeText))

from sklearn.neighbors import NearestNeighbors


def get_topk_recommendations(X, query_ind=0, metric="cosine", k=5):
    query_idx = item_inverse_mapper[query_ind]
    model = NearestNeighbors(n_neighbors=k, metric="cosine")
    model.fit(X)
    neigh_ind = model.kneighbors([X[query_ind]], k, return_distance=False).flatten()
    neigh_ind = np.delete(neigh_ind, np.where(query_ind == query_ind))
    recs = [id_joke_map[item_inverse_mapper[i]] for i in neigh_ind]
    print("Query joke: ", id_joke_map[query_idx])

    return pd.DataFrame(data=recs, columns=["top recommendations"])


get_topk_recommendations(item_user_mat, query_ind=8, metric="cosine", k=5)

Collaborative filtering using the `surprise` package#

Although matrix factorization is a prominent approach to complete the utility matrix, TruncatedSVD is not appropriate in this context because of a large number of NaN values in this matrix.

We consider only observed ratings and add regularization to avoid overfitting.
Here is the loss function

\[f(Z, W) = \sum_{(i,j) \in R} ((w_j^Tz_i) - y_{ij})^2 + \frac{\lambda_1}{2}\lVert Z \lVert_2^2 + \frac{\lambda_2}{2}\lVert W \lVert_2^2\]

Let’s try it out on our Jester dataset utility matrix.

import surprise
from surprise import SVD, Dataset, Reader, accuracy

reader = Reader()
data = Dataset.load_from_df(ratings, reader)  # Load the data

# I'm being sloppy here. Probably there is a way to create validset from our already split data.
trainset, validset = surprise.model_selection.train_test_split(
    data, test_size=0.2, random_state=42
)  # Split the data

Regularized SVD

k = 10
algo = SVD(n_factors=k, random_state=42)
algo.fit(trainset)
svd_preds = algo.test(validset)
accuracy.rmse(svd_preds, verbose=True)

RMSE: 5.2893

5.28926338380112

No big improvement over the global baseline (RMSE=5.77).
Probably because we are only considering a sample.

Cross-validation for recommender systems

We can also carry out cross-validation and grid search with this package.
Let’s look at an example of cross-validation.

from surprise.model_selection import cross_validate

pd.DataFrame(cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True))

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    5.3022  5.2667  5.2661  5.2970  5.3140  5.2892  0.0194  
MAE (testset)     4.2272  4.1867  4.1792  4.2200  4.2142  4.2055  0.0190  
Fit time          0.21    0.22    0.22    0.22    0.22    0.22    0.00    
Test time         0.07    0.07    0.12    0.07    0.12    0.09    0.03    

	test_rmse	test_mae	fit_time	test_time
0	5.302232	4.227237	0.212285	0.068680
1	5.266715	4.186704	0.222506	0.066116
2	5.266053	4.179180	0.218399	0.119088
3	5.296970	4.219969	0.217175	0.068268
4	5.313987	4.214229	0.217195	0.118838

Jester dataset is available as one of the built-in datasets in this package and you can load it as follows and run cross-validation as follows.

data = Dataset.load_builtin("jester")

pd.DataFrame(cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True))

(Optional) PyTorch implementation#

We can also implement the loss function above using PyTorch.

import pandas as pd

# Load the dataset
ratings = pd.read_csv('../data/jester_ratings.csv')
print(ratings.head())

   userId  jokeId  rating
     1       5   0.219
     1       7  -9.281
     1       8  -9.281
     1      13  -6.781
     1      15   0.875

ratings_df = pd.read_csv('../data/jester_ratings.csv')
ratings = ratings_df[ratings_df["userId"] <= 4000].copy()
# ratings = ratings_full[ratings_full["userId"] <= 4000]

ratings

	userId	jokeId	rating
0	1	5	0.219
1	1	7	-9.281
2	1	8	-9.281
3	1	13	-6.781
4	1	15	0.875
...	...	...	...
141357	4000	18	-6.062
141358	4000	19	0.125
141359	4000	76	5.719
141360	4000	53	5.344
141361	4000	143	4.188

141362 rows × 3 columns

from sklearn.preprocessing import LabelEncoder

user_encoder = LabelEncoder()
ratings['user'] = user_encoder.fit_transform(ratings.userId.values)

item_encoder = LabelEncoder()
ratings['item'] = item_encoder.fit_transform(ratings.jokeId.values)

num_users = ratings['user'].nunique()
num_items = ratings['item'].nunique()

ratings

	userId	jokeId	rating	user	item
0	1	5	0.219	0	0
1	1	7	-9.281	0	1
2	1	8	-9.281	0	2
3	1	13	-6.781	0	3
4	1	15	0.875	0	4
...	...	...	...	...	...
141357	4000	18	-6.062	3634	7
141358	4000	19	0.125	3634	8
141359	4000	76	5.719	3634	65
141360	4000	53	5.344	3634	42
141361	4000	143	4.188	3634	132

141362 rows × 5 columns

X = ratings.copy()
y = ratings[user_key]
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_valid.shape

((113089, 5), (28273, 5))

Let’s create a custom ItemsRatingsDataset class for the ratings data, so that we can use of PyTorch’s DataLoader for batch processing and data shuffling during training.

import torch
from torch.utils.data import Dataset, DataLoader

class ItemsRatingsDataset(Dataset):
    def __init__(self, users, items, ratings):
        self.users = users
        self.items = items
        self.ratings = ratings

    def __len__(self):
        return len(self.ratings)

    def __getitem__(self, idx):
        return {
            'user': torch.tensor(self.users[idx], dtype=torch.long),
            'item': torch.tensor(self.items[idx], dtype=torch.long),
            'rating': torch.tensor(self.ratings[idx], dtype=torch.float)
        }

train_dataset = ItemsRatingsDataset(X_train['user'].values, X_train['item'].values, X_train['rating'].values)
train_dataloader = DataLoader(train_dataset, batch_size=512, shuffle=True)

valid_dataset = ItemsRatingsDataset(X_valid['user'].values, X_valid['item'].values, X_valid['rating'].values)
valid_dataloader = DataLoader(valid_dataset, batch_size=512, shuffle=True)

The CFModel class below defines the architecture and the forward method of collaborative filtering. We are using the embedding layer of torch.nn which simply creates a lookup table that stores embeddings of a fixed dictionary and size.

import torch.nn as nn

class CFModel(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(CFModel, self).__init__()
        # Embeddings for users
        self.user_emb = nn.Embedding(num_users, emb_size)

        # Embeddings for items
        self.item_emb = nn.Embedding(num_items, emb_size)

        # Initialize the embeddings 
        self.user_emb.weight.data.uniform_(0, 0.05)
        self.item_emb.weight.data.uniform_(0, 0.05)
        
    def forward(self, user, movie):
        user_embedding = self.user_emb(user)
        item_embedding = self.item_emb(movie)
        # calculate predicted ratings as dot products
        # of corresponding user and item embeddings
        return (user_embedding * item_embedding).sum(1)

model = CFModel(num_users, num_items)

import torch.optim as optim

# Loss function
criterion = nn.MSELoss()

# Regularization coefficient
lambda_reg = 0.001

# Optimizer (without weight decay) 
# We manually add regularization in the loss below
optimizer = optim.Adam(model.parameters(), lr=0.001)


import torch
import numpy as np
import torch.nn.functional as F

def train(model, train_dataloader, valid_dataloader, optimizer, criterion, epochs=10):
    for epoch in range(epochs): 
        model.train()  # Set model to training mode
        total_train_loss = 0
        for batch in train_dataloader:
            optimizer.zero_grad()
            
            # Forward pass
            predictions = model(batch['user'], batch['item'])
            
            # Compute the base loss (MSE)
            loss = criterion(predictions, batch['rating'])
            
            # Compute regularization terms (L2 norm of user and movie embeddings)
            user_reg = lambda_reg * model.user_emb.weight.norm(2)
            item_reg = lambda_reg * model.item_emb.weight.norm(2)
            
            # Total loss is the sum of base loss and regularization terms
            train_loss = loss + user_reg + item_reg
            
            # Backpropagation
            train_loss.backward()
            optimizer.step()
    
            total_train_loss += train_loss.item()
                    
        avg_train_loss = total_train_loss / len(train_dataloader)
    
        model.eval()  # Set model to evaluation mode
        total_valid_loss = 0
        all_predictions = []
        all_ratings = []
        with torch.no_grad():
            for batch in valid_dataloader:
                predictions = model(batch['user'], batch['item'])
                valid_loss = criterion(predictions, batch['rating'])
                total_valid_loss += loss.item()
                all_predictions.extend(predictions.tolist())
                all_ratings.extend(batch['rating'].tolist())
        avg_valid_loss = total_valid_loss / len(valid_dataloader)
        rmse = np.sqrt(np.mean((np.array(all_predictions) - np.array(all_ratings)) ** 2))
        print(f'Epoch {epoch+1}, Train Loss: {avg_train_loss:.4f}, Valid Loss: {avg_valid_loss:.4f}, Valid RMSE: {rmse:.4f}')

train(model, train_dataloader, valid_dataloader, optimizer, criterion, epochs = 20)

Epoch 1, Train Loss: 31.8835, Valid Loss: 26.9124, Valid RMSE: 5.2919
Epoch 2, Train Loss: 25.1327, Valid Loss: 23.5237, Valid RMSE: 4.9331
Epoch 3, Train Loss: 23.2564, Valid Loss: 23.8726, Valid RMSE: 4.8777
Epoch 4, Train Loss: 22.7042, Valid Loss: 20.4802, Valid RMSE: 4.8478
Epoch 5, Train Loss: 22.2066, Valid Loss: 25.0819, Valid RMSE: 4.8148
Epoch 6, Train Loss: 21.5153, Valid Loss: 21.0060, Valid RMSE: 4.7681
Epoch 7, Train Loss: 20.6530, Valid Loss: 21.4653, Valid RMSE: 4.7184
Epoch 8, Train Loss: 19.8259, Valid Loss: 18.8280, Valid RMSE: 4.6790
Epoch 9, Train Loss: 19.1407, Valid Loss: 17.8768, Valid RMSE: 4.6501
Epoch 10, Train Loss: 18.5690, Valid Loss: 18.0142, Valid RMSE: 4.6269
Epoch 11, Train Loss: 18.0456, Valid Loss: 17.7909, Valid RMSE: 4.6082
Epoch 12, Train Loss: 17.5188, Valid Loss: 17.0157, Valid RMSE: 4.5913
Epoch 13, Train Loss: 16.9827, Valid Loss: 15.0803, Valid RMSE: 4.5783
Epoch 14, Train Loss: 16.4400, Valid Loss: 18.2713, Valid RMSE: 4.5671
Epoch 15, Train Loss: 15.8830, Valid Loss: 19.4104, Valid RMSE: 4.5590
Epoch 16, Train Loss: 15.3265, Valid Loss: 15.5913, Valid RMSE: 4.5510
Epoch 17, Train Loss: 14.7745, Valid Loss: 14.3047, Valid RMSE: 4.5445
Epoch 18, Train Loss: 14.2318, Valid Loss: 14.5296, Valid RMSE: 4.5399
Epoch 19, Train Loss: 13.7009, Valid Loss: 13.7255, Valid RMSE: 4.5345
Epoch 20, Train Loss: 13.1774, Valid Loss: 11.8074, Valid RMSE: 4.5313

This is great! With this, we have the flexibility to tailor the loss function in the training loop as needed. For instance, we can integrate both user and item biases into the model and include regularization terms for these biases (challenging lab exercise).

Lecture 07: Collaborative filtering class demo

Contents

Lecture 07: Collaborative filtering class demo#

Example dataset: Jester 1.7M jokes ratings dataset#

Dataset stats#

Utility matrix for Jester jokes ratings data#

Data splitting and evaluation#

Data splitting#

Baseline approaches#

Global average baseline#

\(k\)-nearest neighbours imputation#

(Optional) Finding nearest neighbors#

(Optional) \(k\)-nearest neighbours on a query joke#

Collaborative filtering using the surprise package#

(Optional) PyTorch implementation#

Example dataset: Jester 1.7M jokes ratings dataset #

\(k\)-nearest neighbours imputation #

(Optional) Finding nearest neighbors #

Collaborative filtering using the `surprise` package#