The Problem with Performance Ratings

Douglas Gresham
7 min readSep 1, 2022

One of the hallmarks of any performance management system is the performance rating — a single descriptor summing up what the company thought of your last 3, 6 or 12 months of work. They’ll typically anchor around a “Meets Expectations” rating, with one or two below that representing degrees of underperformance and two or three above that to recognise growth beyond the base expectations for your role and level.

Here are some of the problems I’ve run into with ratings, along with some ways of talking about them that I find helpful.

Ratings Are Not Grades

Five stars on a red and blue background
Photo by Towfiqu barbhuiya on Unsplash

It’s tempting to think that a rating is like a grade at school or university — but this is a mistake. At school you study for a fixed length of time, sit an exam and get a result — they’re the sole outcome, not part of an ongoing conversation about your performance and growth over time.

This means someone used to high grades throughout their academic career might think a ‘Meets Expectations’ rating is bad because there are multiple possible ratings higher than it — when in reality we should have high expectations such that meeting them is an achievement you should rightly be proud of, and it’s a single point along your career journey rather than the end of it.

For managers, you should be having this conversation early on with people on your team. If you only have it after delivering a rating, it’s going to feel like post hoc rationalisation.

Ratings Are Discrete, Growth Is Continuous

A photo of a mountain road stretching into the distance
Photo by Joshua Woroniecki on Unsplash

It’s neither feasible nor valuable to give everyone a full performance review on a daily, weekly or even monthly basis — which means that ratings will inevitably be only an approximation of how you’re doing, and you should treat them that way.

The mental model I use is: imagine your performance is a line graph of performance against time. It will fluctuate — you have good days and bad days, after all — but so long as the mean gradient of the line is moving upwards, you’re growing. A rating, on the other hand, is more like the area under the curve for a fixed time window.

A line chart representing performance over time, varying over time but trending upwards, overlayed on a bar chart showing assigned performance ratings over that same time period.
An illustrative graph of actual performance (line) vs discrete ratings given for that performance (bars)

One particular weakness of discrete ratings is they don’t account for growth trajectory. In most implementations, a person who is steadily Meeting Expectations for a whole 6 months will get the same rating as someone who was Meeting Expectations at the start of the period, grew their impact by taking on more difficult things and getting out of their comfort zone, and were Exceeding Expectations for the last 6 weeks of the cycle. This is because the discrete rating has to be representative of performance for the entire cycle, and hence distinct from trajectory.

A line chart representing performance over time. There are two lines, one with an upwards gradient and one flat; both represent a discrete rating of Meets Expectations, even though the upwards trajectory is predictive of higher ratings in the future.
An illustrative graph where two people would be given the same performance rating for a discrete rating period, but have different trajectories.

To have the most success in your career, you should optimise for the gradient of the line (ie at what rate are you consistently growing), not for a particular rating at a fixed point in time. That will lead to far better outcomes over time. I’ve been at tech companies which use performance ratings for 15+ years, I’ve never received the top rating at any company, I have received ratings below ‘Meets Expectations’, and my career has progressed just fine — because I have focused on the gradient of the line and how to push it up.

So Why Even Bother With Ratings?

A photo of a person using a panel next to a door to rate the service they have received at this location
Photo by Celpax on Unsplash

There are a number of reasons why companies implement performance ratings:

  • Companies want to reward high performance and ratings are a way to do that
  • A ratings schedule sets a baseline for how often people get feedback — every 6 months isn’t good enough, but it’s definitely better than never!
  • The time leading up to assigning and delivering ratings creates a period of reflection which can help managers spot patterns and trends where addressing them would help the person progress
  • Ratings and justifications create a written record which can help with manager changes, promotion decisions and so on
  • Ratings help with level-setting amongst managers to ensure fairness and consistency (this is why we calibrate)

All of these things can be better achieved by experienced, well-coached managers engaging with their reports regularly; however, a formal performance review/rating system can ensure that there is a common minimum baseline for these activities that all managers must adhere to. It won’t create great managers, but it can catch or prevent terrible ones, as well as give some basic structure for those starting out on the manager’s path.

For managers, this means you should aim to have the rating conversation be merely a check-in, summarising all your conversations from the period the rating covered. If ratings feel like a huge deal, that’s a sign that you’re not sufficiently engaged with your reports on performance and career growth.

Reward Is Where This Gets Hard

A photo of two copper coins
Photo by Siora Photography on Unsplash

All of this sounds great, but for most people ratings become most real in that they’re tied to compensation — to salary adjustments, bonus multipliers and equity refreshes.

This can get particularly wonky in cases where you have someone who was on the cusp of a higher rating or on a high growth trajectory — formulaic compensation treats that the same as someone who is only just at that rating.

I’m not a fan of adding many more ratings — Google used to use a point scale for ratings (2.9 and down being underperforming, 3.0 being “Meets Expectations”, 3.4 being “Exceeds Expectations”, and upwards to 4.0 and higher) where we’d debate endlesslessly over a 0.1 adjustment up or down, and it was the epitome of the fallacy of false precision. I’m even less of a fan of giving higher ratings to achieve the compensation outcomes you want, as that sets people’s expectations for ratings wrongly, removes any meaning they might have and makes calibrations into a fight to get the highest ratings you can for your reports rather than crafting quality feedback and identifying growth opportunities for people.

The least-worst tool I’ve come across is giving managers discretion to adjust compensation manually. There is absolutely the risk of bias, but you can have guardrails (have your formula spit out ranges rather than absolute numbers, for instance) and I’ve realised that bad managers will just find other outlets for their bias — be that manipulating ratings, withholding the best opportunities, giving overly critical feedback, or whatever other tool. You’re better to coach them (or fire them if that doesn’t work) than to remove everyone’s ability to correct things that are wrong.

Fixed Distributions Are Where This Gets Toxic

A photo of some 3D shapes arranged into a bar chart
Photo by Алекс Арцибашев on Unsplash

A Forced Distribution (aka Stack Ranking, Bell-Curve Ranking) is a technique for assigning ratings where a set number of people get each rating. For example, if you have a team of 10, you might be told that 1 person must be Underperforming, at least 6 should be Meets Expectations, at most 2 can be Exceeds Expectations and at most 1 can be a higher rating than that.

The theory here is that if you continuously, rigorously performance manage your lowest performers, you’ll ensure you hold a high performance bar and motivate your team to perform well so they don’t end up in that bottom 10%. There’s also an undercurrent of not trusting managers — thinking that they’re soft or lazy and don’t want to have hard conversations with people who are struggling, and that a forced distribution will make them do their job.

In practice, this makes people — managers and reports alike — laser-focused on ratings alone. It damages trust, it damages collaboration since you’re in competition with your peers not to be in that bottom 10%, and it creates incentives for managers to behave badly (for example, ‘hire to fire’).

Distributions can be useful as a sense check — if everyone is Exceeding Expectations, that is anomalous and you better have good evidence to back it up — but forcing them is Goodhart’s Law at its worst. If you’re in a place that does this, strongly consider disregarding the rest of my advice as it may be actively harmful to you.

So How Do I Make This Work?

A photo of someone’s hands, each hand holding a jigsaw piece, and the jigsaw pieces look like they should fit together
Photo by Vardan Papikyan on Unsplash

As a manager, your aim should be to make the rating simply a checkpoint on your report’s personal growth over time. A few things I find help achieve this:

  • Shared authorship of written evaluations, done iteratively over the whole ratings period to ensure you’re talking about performance and growth regularly (and there’s no mad rush to write the evaluation at the end!)
  • Making the difference between rating and trajectory explicitly part of those conversations
  • Not working for companies who implement forced distributions!

--

--

Douglas Gresham

He/him. Currently Director of Engineering @ Skyscanner; formerly Google and FB.