How rating thresholds shape our purchasing choices

This page is part of a global project to create a better online reviews system. If you want to know more or give your feedback, write at [email protected] and we’ll grab a beer ;)

Thresholds

Comparing average ratings is often the first thing users do during the calibration step (see “Why Do We Look at Online Reviews”). With many available options, people need to narrow down their choices before moving to the compare step. The average rating, visible in the list of options, plays a significant role in this decision-making process.

Google Maps’ listings of restaurants: the average rating, the name, and the location are the three pieces of info available.

At this stage, all listings below a certain threshold may be automatically and almost unconsciously excluded.

Looking at the screenshot from Google Maps, you probably wouldn’t consider options below a 4 rating. This means that 4 is a threshold. Add to that that these filters are not just psychological: platforms pre-selects and removes low-rated options to avoid overwhelming users with too many choices. For example, Google only shows highest-rated spots, and Uber first propose rides to the best-rated drivers.

When it’s not automatic, platforms provide filters to remove low-rated options, and 70% of people use them $^1$.

Google Maps’ rating filter

Airbnb’s new “Guest favorites” label filters listings above 4.9 (with other few conditions)

ReviewTrackers study results $^1$

In this context, it’s understandable why businesses might gate reviews to stay above the threshold (see “Review Gating”).

Moreover, thresholds and standards vary across industries, countries, and platforms (e.g., the average rating is 4.3 stars on Google, 4.25 stars on Tripadvisor, and 3.65 stars on Yelp). The same applies to customer support satisfaction and net promoter scores $^2$.

Psychological Tricks

An interesting research study $^3$ shows that people use different strategies when making choices based on customer ratings. These strategies vary with the number of reviews and the average rating range.

A personal example on Airbnb: I tend to select listings above 4.8 and only consider listings between 4.6 and 4.8 if the higher-rated ones are either booked or too expensive. Every time I do this, I feel silly because:

I’ve been disappointed by highly-rated places that were just “ok” and lacked character, often listed by professionals who don’t live there. They were missing many homey touches.

I frequently see very decent listings between 4.4 and 4.8 that are just as good as the higher-rated ones. For example, a listing rated 4.66 with three reviews, including one 4-star rating with a perfect comment. Sometimes, a “bad” rating is due to a specific issue or comes from a very demanding guest, which doesn’t apply to me. The impact is real: these listings get booked less.

Another psychological bias is that while consumers expect a high average rating, a 5-star average rating can appear suspicious. A study showed that 68% of consumers either “agree” or “somewhat agree” that a high review rating is not trustworthy unless there are a lot of reviews $^4$. This skepticism arises from the abundance of fake reviews (see “Fake Reviews Flood the Web”). Human behavior is indeed complex.

The psychological impact isn’t limited to the average rating. The number of reviews also plays a significant role, and people seem to use a Bayesian approach to compare options. We’ll cover that in “Number of Reviews”.

<aside> 💡 Exploration

A tool to help users search for a product or service. Based on their criteria, we can check both item descriptions and consumer reviews to personalize the rendered list. At the end of the day, this is what users manually do when reading comments (see “Why Do We Look at Online Reviews” and “Categorization and Subjectivity”). Currently, this information is only available at the Compare and Confirm steps, not at the Calibrate step. A discovery tool is needed. Some startups have attempted to tackle this need but have yet to succeed. Such a tool should leverage AI and use the customer’s taste from previous purchases to sort the list. The average rating could be personalized or even removed if the rendered list is narrowed down enough.
Show the industry’s average rating to help customers calibrate. This introduces a new issue: the threshold would become even more drastic, which may not be ideal.
Peer reviews. Display reviews left by friends and connections first. Recommendations are the strongest influence on people. The opinions we trust most are those from people we know. Platforms like Mapstr rely heavily on peer reviews: when selecting a restaurant, people trust recommendations from their connections over average ratings. This erases the threshold effect.

On Mapstr, you can check the places your friends tried.
Labels. Airbnb uses the “Superhost” label for hosts with more than 10 bookings in the last 12 months and an average rating over 4.8. While these labels still rely on average ratings, we could introduce other criteria: for example, an “Outstandingly Clean” label for hosts if at least five guests marked the place as exceptionally clean without any complaints. This could highlight other factors and rely less on average ratings, allowing hosts to be recognized for their strengths.

Airbnb displays the ‘Superhost’ badge and a few benefits of the place. Some come from reviews (”Great location”), and some from the listing details (”Self check-in”). It’s available only at the listing level.

</aside>