Archive for May, 2008

Does the Netflix challenge have it backwards?

What if the netflix challenge is barking up the wrong tree?  They’re trying to push their algorithm closer and closer to a 10% improvement. But if their algorithm is peaking, maybe they really need to scrap their rating system that feeds the algorithm

…(disclaimer: this entry is one of my rambling rants, sorry.  But I think it’s really interesting.  I’ve added bits and pieces over the last week, cut and pasted, a bunch of times and it isn’t very coherent.  But i want to just post it, move on, and start talking to people about it.)…

They’re trying to guess people’s taste based on rating system of 1-5 stars. Is that a very good database medium? Couldn’t they spend a million dollars to make a better database?  What if thinking of movie taste as “good or bad” is fundamentally flawed?  What if we all have guilty pleasure movies that we’re likely to watch even if we rate it lower on the “good or bad” scale.  Netflix already knows that there are psychological influences (like the anchoring effect) that skew the accuracy of a five star rating system.  (see this wired article)  Programmers do a lot of fancy gymnastics to account for effects like that, but there could be a better approach.   What if Netflix used one of these rating systems instead of a five-star system?

What if they used tag keywords? What if the user could play an online game where they would fight for the movies they liked the best.  What if users bet on which movie they thought would get higher ratings?  What if they were given a random sampling of reviews and were asked to agree or disagree?

The five star rating system’s early predecessors like Slashdot needed a rating system to provide the best content to fit a community of readers. But Netflix users are not voters in a democracy. They are niche choosers.  Netflix isn’t distilling a single set of content, they are tailor fitting content to users – it doesn’t make sense to use the same five star rating system.   Netflix is using the ratings of each movie as building blocks to define what niche a person is in. But it would be better to use a rich language of associative cues to define a rich web of micro-genres without worrying about each person’s preference about each movie.  It is more important to get a rich rendering of a micro-genre web.  THEN you can guess which nodes of that web the user will most likely be attracted to.

At first it seems stupid to gather data by having the user play a loose association game; because the data is very low-resolution – any given “review” (collected by keyword or playing a game) seems inconclusive and almost arbitrary.  But on the other hand, this data has personality, and when you have a huge sample section, even wild fluctuations in arbitrary choices average out to produce meaningful results.  It’s the old trick that a room full of people guessing the number of jellybeans in a jar will all be really far off, but their average will be scary accurate

The upside of this method, is that you gather a much richer database of information.  You won’t have a very good idea how the user feels about any given movie, but you WILL start to home in on which movies belong to which niches.  Then you can judge which niches the user is likely to be attracted to.  Netflix is asking people to guess the jellybeans to the nearest 1/5 of the jar, and THAT is ruining their whole crowd sample.  Rather than focusing on each individual’s unique “rating thumbprint” and comparing those thumbprints, they should be establishing a detailed map of micro-genres and then deciding which micro-genres the user is most attracted to.  In trying to be accurate about each user’s opinion on each movie, Netflix looses the detail in their rendering of a micro-genre web.

Whew, sorry about the disjointed rant style post.  Cheers -