Content Based Recommendation System

Recommender Prototype using Content Based Filtering

Download as .zip Download as .tar.gz View on GitHub

Setup

The system is built with LensKit, an open-source took kit for building recommenders.

Requires the following:

Background

This recommendation system prototype uses content-based filtering. For detailed background, please refer to: http://recommender-systems.org/content-based-filtering/

The algorithm implemented in this prototype is composed of the following steps:

1. Compute item-tag vectors (the model)

Implemented in the model builder (TFIDFModelBuilder, in the get() method), compute the unit-normalized TF-IDF vector for each movie in the data set. The model contains a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item.

2. Build user profile for each query user

The makeUserVector(long) method of TFIDFItemScorer takes a user ID and produces a vector representing that user's profile. In the original implementation, the profile was the sum of the item-tag vectors of all items the user has rated positively (>= 3.5 stars). This approach was later improved with weighted user profile (with the older implementation commented out for reference). Weighted profile is computed with weighted sum of the item vectors for all items, with weights being based on the user's rating.

3. Generate item scores for each user

The heart of the recommendation process in many LensKit recommenders is the score method of the item scorer, in this case TFIDFItemScorer. This method scores each item by using cosine similarity: the score for an item is the cosine between that item's tag vector and the user's profile vector.

Test Drive

A set of test data is provided for movie ratings, but can be easily adopted for other domains.

data/movie-tags.csv

Attributes associated with each movie. This is the basis of the model generated in step 1.

data/movie-titles.csv

Maps Movie IDs to Movie Titles.

data/ratings.csv

Users and their movie ratings. Each line of the CSV file is ordered as: User ID, Movie ID, Rating

data/users.csv

Maps User ID to User Name.

The test data is injected into the system in CBFMain.java in the method configureRecommender().

Run the recommender with command similar to the following, where the arguments are the user IDs:

runecbf 4045 144 3855 1637 2919