When calculating the number of reviews needed to achieve a stable average rating, we use the following formula:

$$ D=d/(n+1) $$

Where $D$ is the difference between the new average $X[n+1]$ and the old average $X[n]$ when a new rating $x[n+1]$ is left, and $d$ is the difference between the newest rating $x[n+1]$ and the old average $X[n]$, with $d ≤ 4$.

$$ n=d/D-1 $$

We assume that an average is accurate when each new rating doesn’t change the average by more than 0.1.

With an average rating of 5/5 stars and a new rating at 1/5 (edge case, $d=4$), we get the following result:

$$

D<0.1 => n>39 $$

In this case, we would need at least 39 reviews for the average to be stable enough.

If we choose an average of 4/5 stars (industry average) and a new rating of 1/5 stars ($d=2$):

$$ D<0.1 => n> 29 $$

I performed a test on a Google Sheet with random values of ratings between 1 and 5. In this case, the stability seems to occur after 21 reviews.

$$ D<0.1 => n> 21 $$

If we consider stability to occur when the average changes by less than 0.05 (because a score of 4.76 dropping by 0.05 would round from 4.8 to 4.7 on Google), stability occurs after 30 reviews:

$$ D<0.05 => n> 30 $$

To go further: In reality, ratings are not random and are generally consistent with the average rating. We would need to employ probabilistic mathematics (e.g., Bayesian probabilities) to achieve a more accurate result.

The actual number of reviews needed is probably around 20.

An interesting piece of text to read:

How many reviews does it take to achieve a meaningful average rating?