Saturday, December 12, 2009

Intro to Computer Science Lesson Plan: Machine Learning


As noted I wanted to bring "interesting" CS topics to my freshman class. Here is a relatively easy lab lesson that presents a 'linear algebraic' method for movie recommendation. The main idea is that each user and movie is described by a vector of hidden attributes values. How well a user likes a given movie is just the dot product of vectors. To find these vectors we do a computation based on a simplified steepest descent to SVD. I adapted an idea from Simon Funk's blog. Here is the lab. For our class we found that "Dark Knight" and "Napoleon Dynamite" had the extreme hidden attribute values.

------------------------------------------------------------------

We can determine recommendations by computing hidden attribute values that describe the tastes of each user, and how well a movie can be described by this attribute. Let us work on computing the most significant such value, one value for each user and one value for each movie. We can begin by letting all these values be equal to 1.0 as follows.

numUsers = len(allRatings)
numMovies=30
userVal= [1.0]* numUsers
movieVal = [1.0]*numMovies

We are now ready to try to learn the hidden values by using the actual rating values (submitted in advance by each student). We will try to update them in rounds. In each update round we choose one actual rating at random. To do this choose a user and a movie at random and check that the entry in the allRatings matrix is not 0. We will compute the error by comparing the predictedRating to the actualRating stored in our dataset mldata.py.

predictedRating = movieVal[movie] * userVal[user]
error = actualRating – predictedRating

As it turns out, we can always improve the situation my changing the userVal according to the error and the movieVal as follows:

userVal[user] += error * movieVal[movie] * smallnum

We explained why this is true in lecture (steepest descent). Similarly, we can improve the movieVal according to the error and the userVal as follows:

movieVal[movie] += error * userVal[user] * smallnum

We can put the whole update routine in a function as follows where we’ve added a dampening factor so that the updates change things in a smooth way:

def update(user, movie, actualRating):
predictedRating = movieVal[movie] * userVal[user]
error = actualRating - predictedRating
userVal[user] += error * movieVal[movie] * 0.0001
movieVal[movie] += error * userVal[user] * 0.0001

Now simply call this update function over and over on random users and random movies (being sure that there is an actual rating) and (with luck) the numbers will converge quickly to the hidden values we have been seeking.

No comments:

Post a Comment