Make performance reviews more useful with written evaluations

Douglas Gresham
9 min readAug 5, 2022

If you’re a manager at a company larger than a seed-stage startup, you probably have to deal with a regular performance management cycle. We could debate whether we should abolish them entirely or radically overhaul them, but for most of us they’re a fact of life.

The biggest problem I see is there’s a lot of effort put in without people being reviewed getting value from the process. Performance reviews can feel disconnected from your day-to-day work or career development, they generate feedback that is generic or not timely (or even no feedback at all besides a rating), and from the outside the process seems opaque (and perhaps capricious).

Here are some of the things I use to address those and make performance reviews more useful.

Written Evaluations

Photo by Green Chameleon on Unsplash

There are two benefits of putting evaluations in writing, starting with Calibration.

Calibration is a process in which managers compare proposed ratings to ensure fairness — that everyone is being rated the same way and using the same criteria. Calibrations become dysfunctional for many reasons, and written evaluations can help.

One form of dysfunction is the verbal pitch. Calibration meetings are long, but even then you don’t have a lot of time per person. Say you run a whole 8-hour day-long meeting, and there are 80 people in the calibration set (not atypical at large tech companies) — that nets out at 6 minutes per person. Talking through who this person is and what they worked on can easily use up that entire time; you’re also less likely to dive into Misses unless you’re trying to justify a low rating. If you use a written evaluation instead — ideally available as a pre-read — you can use those 6 minutes for specific questions about the person’s contributions, justifications for their rating, crafting feedback, identifying development opportunities, and other actually valuable activities.

A more severe dysfunction occurs when managers’ incentives (or perceived incentives) are to advocate for the highest ratings possible, often in competition with other managers (see also the dreaded Stack Ranking). Combine this with verbal pitching and you end up in a world where your rating depends more on your manager’s experience and articulation (and whether they like you) than your own contributions. Written evaluations don’t solve this, but they do at least help mitigate some of the subjectivity.

The second major benefit of the written evaluation is it forces you to produce a concrete artefact for your report with feedback, a direct tie to how their rating was decided, and hopefully actions they can take to achieve their next target (be that a higher rating, a promotion or other career/life goals). This is part of increasing transparency, which we will discuss shortly.

The Levelling Guide

Photo by Daniil Silantev on Unsplash

So you’re going to put together written evaluations. Where do you start? Maybe you have a Competency Framework of some sort (if you don’t and want to, Gitlab’s one is an excellent inspiration), but even then it can be hard to concretely tie your day-to-day work to it.

The Levelling Guide is a document which gives a non-exhaustive list of examples of what is below/at/above expectations for each level (I use -/=/+ as shorthand for each). For example, considering a graduate-level software engineer and looking only at Delivery, you could have:

  • (-) You deliver fewer tasks than expected, usually because they take longer than expected to complete or because you became blocked and did not ask for help
  • (-) You regularly commit to more than you can actually complete, impacting your team’s dependability
  • (-) Your code requires more rounds of peer feedback than expected, or the same problems are coming up repeatedly in each pull request
  • (-) You claim tasks are complete when they are not delivered to the agreed scope
  • (=) You reliably deliver your tasks to the agreed scope and in a timely fashion
  • (=) You do not regularly overcommit, but only what you can reliably deliver on in a sprint
  • (=) You are learning your craft, taking action on feedback such as in code reviews and starting to understand the systems you are working on
  • (+) You are taking on larger pieces of delivery than tasks; you may have started owning Epics successfully or have started breaking down larger tasks into manageable chunks
  • (+) You are helping your wider team improve delivery by contributing to retrospectives or other continuous improvement processes, or by helping out with things that are not your core tasks

This will vary depending on exactly how your competencies are laid out, but hopefully you get the idea. You will need to be explicit that this list is non-exhaustive, that in case of conflict the competency framework wins, and that as a manager it’s still your evaluation which counts. This is simply a tool to have better conversations between manager and report about how they’re doing, not a stick to beat managers with!

I recommend using bullet points in the written evaluation too, as it helps keep things more concise and data-driven (nobody wants to write or read long paragraphs of this stuff). Links to direct evidence help too. As an example:

  • (=) Delivered on tasks (link 1, link 2, link 3) in Epic A (link) in a timely fashion and to scope
  • (=) Showing great improvement on decomposition and testing through code review feedback (link 1, link 2, link 3)
  • (+) Suggested change to sprint ceremonies which the team adopted and is saving them 30 minutes a week on meetings (link to retro notes)

You need to be careful not to allow this to become “if I get 5 (+) I’ll get an Exceeds Expectations rating”, but the flip side is you can show someone with say an “Achieves Expectations” rating that you saw the things they did that were above and beyond even if they didn’t add up to meeting the “Exceeds Expectations” rating bar for the whole cycle.


Photo by Brett Jordan on Unsplash

A Miss is an action or behaviour someone undertakes with a negative outcome. Everyone has them, and failing to talk about them candidly with our reports means we’re denying them feedback and learning opportunities.

A Miss doesn’t have to mean a rating drop. I find that the highest performers typically have the most misses in their evaluations because they’re stretching themselves and because they’re most effective in self-identifying Misses. That means if we’re going to determine how misses impact ratings we need a framework for talking about them. I use three criteria:

  • Impact: how bad was the result of the Miss? Think about things like production outages, broken relationships, or your manager having to step in and course-correct.
  • Foreseeability: how reasonable is it to expect a person at that level to have anticipated the Miss? If you’re a graduate-level software developer it’s unreasonable to expect a deep understanding of availability, but if you cause an outage by ignoring the guidance of a senior engineer who does have that experience it’s more serious.
  • Response: how did the person react to feedback? Repeating Misses or not taking feedback seriously is strongly negative; fixing the miss is OK; understanding why it happened and taking steps to ensure it can’t happen to you or anyone else in the future is what we should be aiming for. It’s also excellent if the miss was self-identified and self-corrected.

You can go as far as to expect every manager to have at least one Miss for everyone on their team. If they can’t identify one, it’s likely they’re either not paying enough attention or not finding stretch opportunities for the person in question. This has the added bonus of normalising talking about Misses, which many will avoid by default because they fear the repercussions of admitting mistakes.

When adding Misses to a written evaluation, I use a ! to denote a Miss, and the same -/=/+ for where this sits against expectations. For example:

  • (!-) Caused an outage by merging a PR without testing. Impact: the service was down for 20 minutes. Foreseeability: high, testing is an expectation for all levels and lack of tests was called out in code review. Response: found and fixed the issue themselves, but is still resistant to writing tests.
  • (!=) Got into a conflict with a senior engineer on another team which became unhealthy. Impact: temporarily damaged the relationship between teams. Foreseeability: somewhat, but the senior engineer should be more accountable for ensuring healthy conflict. Response: self-identified the issue immediately, asked for help, and went out of their way to apply that help to repair the relationship.


Photo by Marc Schulte on Unsplash

At one large company where we had a levelling guide, the guide was only available to managers. Someone from HR was worried that people would use it to challenge their ratings if we shared it. My reaction was, “Isn’t that a good thing?”.

In my experience, people are happier when they have clarity about their performance than when they get the rating or promotion they want. The latter is at best a short-term fix; if they don’t get something better next time (or worse, if their rating drops) and they’re still not clear on ‘why’, you’re in trouble. If you have a written evaluation plus the documentation to explain how it relates to the levelling and competencies of your company — ideally with motivating examples of things you could do to improve your rating next time out — you have a much better chance of keeping them engaged with the process. If that means you have to explain why a piece of work your report thinks meets a certain bar doesn’t in your estimation, good — that’s exactly the kind of thing you should be talking about, rather than letting it fester into resentment.

In larger companies, it can take a long time for ratings and promotion decisions to get finalised. My approach is to share everything except the small slice you’re not allowed to — in this case, the full written evaluation with only the rating decision removed. Feedback is more powerful the fresher it is, and it’s grossly unfair to have someone come up with their goals for the next 6 months when two of those months have already passed.

Finally, any changes you make should be at the start of a cycle. Making changes in the run-up to performance evaluations — even if they’re purely about optimising the process — risks feeling like you’ve moved the goalposts on people.

Collaborative, Rolling Feedback

Photo by Adi Goldstein on Unsplash

If you’re a manager you may well be thinking “wow, that’s a ton of extra work”. The good news is that if we’ve shared all of the above with our team, we can get them to write their evaluation together with us. This has the added bonus that they’ll often highlight things we might have missed, especially if they’re a high performer getting involved in lots of different things.

Now if you’re an Individual Contributor you’re almost certainly thinking “wow, that’s a ton of extra work”, and if you’ve worked somewhere that makes you write your own reviews or promotion cases you’re probably about ready to throw things at me.

Thing is, it shouldn’t be extra work. If performance reviews are the only time you’re giving or receiving feedback, something is badly wrong (this is why I doubt performance reviews, 360s and such won’t go away —lack of feedback is an unfortunately common problem and they’re a blunt tool to ensure that people at least get feedback sometimes). The problem comes when your regular day-to-day feedback is divorced from the performance review process.

The best solution to this is to have your reports regularly record what they have done and the feedback they’ve received in the same format as the written evaluation (which you can also supplement with your input as a manager, especially for folks who struggle with self-promotion). This encourages regular reflection, it gives you an anchor for good 121 conversations, it lets you track progress against career goals or development plan, and it means the written evaluation is a copy-paste and some tidying up. It can get awkward if they’re struggling, but that should be awkward and you’re doing everyone a disservice if you wait for a performance review to let someone know they’re falling short and lean in to support them.

If you get this right as a manager you’re empowering people to take ownership of the process, aligning it with your management style and their career goals — rather than it being a black-box process that happens to them every 6 or 12 months.



Douglas Gresham

He/him. Currently Director of Engineering @ Skyscanner; formerly Google and FB.