I’m working on a project that uses reviews and ratings and it got me thinking about scoring systems. The most widely adopted system is the five star ratings but I’m not really sure that it’s the best system out there, it’s confusing like many other systems. But is there a system that can give consistent ratings from a diverse group of users while keeping the rating threshold down?
My first bit of research led me to the official YouTube blog where they state that the vast majority of their votes are five stars, the second most popular vote is 1 star and two, three and four stars hardly even show in the statistics. The consensus is that if a user really likes something he’ll vote with five stars, if he just likes it, is neutral or slightly dislikes it he won’t vote at all and if he really hates it he’ll vote one star.
In a situation like this (casual consumption) a much better system is the thumbs up and thumbs down (like/dislike) rating system. It provides the lowest possible threshold to vote which is the most important factor when the users aren’t actively looking to rate objects. The votes can even be converted to a five star rating system using a normal distribution curve on the ratio of likes and dislikes.
So with a two-choice system settled as the best choice for casual browsing, what’s the best system for a website who targets active users and has a purpose is a rating and reviewing site rather than a content or sharing site? Let me start by bringing up a couple of issues with rating systems:
- Rating threshold, as we see in the YouTube case. A complex rating system provides a threshold to get ratings, which sort of defeats the purpose.
- Common Standards? Each step or level needs to have a defined meaning so that all reviewers can rate in a similar and consistent way.
- Are the levels relative or absolute? If a technical product is reviewed, does five stars mean that it’s better than most of its competitors or that it has a specific list of features of a certain defined quality? Is it relative to other similar products at the time of rating?
- Granularity? A system too coarse will fail to provide sufficient granularity to distinguish products from each other but a system that’s too finely grained will be harder to moderate and define. There’s also the problem of being able to separate two ratings from each other, how many can accurately say if one wine is 2 percentage points better than another?
- Mid-point bias, systems with a geometrical mid-point offer an easy way for the reviewer to rate without really reflecting if he likes the product or not.
Popular in employer and course surveys are what I call mid-point centric scoring (I bet there’s a technically correct term), using a mid-point as reference they ask whether the subject strongly disagrees, disagrees, is neutral, agrees or strongly agree. They offer the easy way out, have a relatively high threshold and don’t offer that many levels. The only positive thing I can see is that the levels are clearly defined.
Five star systems are the most common rating scales but they are often abused and there’s many different versions which leads to inconsistent scoring and analysis. Some have four effective levels (1-4 stars), some five (crossed over first star) and others use increments of half stars. Many use them as digital like (5 stars) and hate (one star) switches while I don’t think I’ve ever seen a definition of what the stars actually mean. At least most users are familiar with them.
The 100 point wine scoring system made popular by Robert Parker has some merits, the major levels (90p, 80, 70p and so on) are defined and provide a sufficient range between them to really separate close products. The main problem with it is the confusion since it starts at 50p for no obvious reason and the problem of separating a subjective product at such fine intervals.
10 point systems are common in sports but seems to have been forgotton for rating and review purposes. I believe that they offer a good compromise between complexity, familiarity and granularity.
A 10 point rating system would use a set of guidelines for the scoring levels are used but with room for flexibility between the defined levels. The ratings should be what I call Snapshot Relative (just made that up), i.e. they are relative to comparable objects at the time of the review. A great product that turns into a classic will keep getting high scores, a novelty product riding high on technical merit will get high points in the beginning of its life but the scores will even out during its lifetime when its replaced with more advanced products.
For my project I actually think that I’ll create a hybrid system using likes and dislikes to measure popularity and general preference and a ten point system for quality rating. This way it’s easier for the users to say that they like the general concept of something (a funny video, for example) but remark that it’s executed in badly (poor video quality).