Are recruiters better than a coin flip at judging resumes? Here's the data.

By Aline Lerner and Peter Bergman | Published:

This post is a very exciting first for interviewing.io because it’s about a proper experiment run by a real, live academic research lab. If you’ve been reading my work for the past decade, you know that I’ve always been something of an armchair researcher. I ran some experiments before starting interviewing.io, and since then, my team and I have kept it up.

One of the experiments I ran before I founded interviewing.io was an attempt to figure out how good recruiters were at judging candidate quality based on resumes. I ran it 10 years ago and discovered that not only was everyone bad at judging resumes (about as accurate as flipping a coin), they all disagreed with each other about what a good candidate looked like.

Even though these results were shocking at the time, the study had some serious limitations. First, I had no objective measures for which candidates were actually good. I was working as a recruiter at the time, so I knew whom I had been able to place, but that’s obviously not the be-all and end-all of engineering ability. Second, I had a non-representative sample of software engineers. Due to my brand, I had managed to attract a lot of excellent, non-traditional candidates — engineers who were actually very good but didn’t look good on paper. These types of resumes are the hardest for recruiters to judge, and the data was full of them. Finally, my sample size wasn’t that big: I ended up with 716 data points in total, only about half of which came from recruiters (the rest came from engineers and hiring managers — my original hypothesis was that they might be better at the task, but I was wrong… everyone was bad at judging resumes).

So, now that I’m CEO of interviewing.io, with access to a lot more data, resources, and a team of excellent academics at Learning Collider, we decided to run this study again, but with a more rigorous treatment and better conditions, to see if we could replicate the results. This time, we focused just on recruiters, given that they’re most often the gatekeepers who decide which candidates get an interview.

Below are all the details, but here’s the TL;DR: we reproduced my results from 10 years ago! Our new study showed that recruiters were only a bit better than a coin flip at making value judgments, and they still all disagreed with each other about what a good candidate looks like.

In this piece, we also talk about:

  • How far off recruiters were in their predictions and how much they disagreed with each other
  • What recruiters say they look for vs. what the data shows they actually look for
  • Why recruiters taking more time to parse resumes would lead to better outcomes (median parse time is just 31 seconds)
  • Whether AI can do a better job at judging resumes (spoiler: yes, it can)

The rest of this piece is co-authored by Peter Bergman, Tushar Kundu, and Kadeem Noray of Learning Collider.


The setup

In the real world, resumes (or LinkedIn profiles) are evaluated by recruiters in minutes — even seconds — and these evaluations are THE thing that determines who gets an interview.

But what do these word walls tell recruiters? How predictive are their evaluations of actual interview success? Ultimately, how good are recruiters at judging resumes?

To answer these questions, we designed a study approximating technical recruiters’ decisions in the real world. We asked1 76 technical recruiters (both agency and in-house) to review and make judgments about 30 engineers’ resumes each, just as they would in their current roles.

They answered two questions per resume:

  • Would you interview this candidate?2 (Yes or No)
  • What is the likelihood this candidate will pass the technical interview (as a percentage)?

We ended up with nearly 2,200 evaluations of over 1,000 resumes.

The resumes in this study belonged to interviewing.io users (with their consent) — actual engineers currently on the job market.

Collaborating on this study with interviewing.io is an ideal scenario, precisely because outcome data were available for comparison purposes. Each engineer in this study has completed multiple mock interviews on the platform. Performance in these interviews is quite predictive of performance in real interviews: top performers (roughly the top 5% of users) on interviewing.io are 3X more likely to pass technical interviews at top-tier companies than candidates from other sources. Even passing a single interview on interviewing.io is a strong predictor of outcomes; it's associated with a 32% increase in the chance of working at a FAANG company post-interview.

Once we had recruiters’ evaluations of the resumes, we compared them to how those engineers actually performed on interviewing.io: skills scores, feedback from interviewers, and ultimately, whether they passed or failed their mock interviews.

Recruiters’ resume judgments are just slightly better than a coin flip

Question #1: Would you interview this candidate?

In aggregate, recruiters in the study recommended 62% of candidates for an interview. But how did recruiter evaluations stack up against candidates’ performance on the platform?

We calculated recruiter accuracy by treating each candidate’s first interview (pass/fail) as the truth, and recruiters’ decision to interview as a prediction. It turns out that recruiters chose correctly 55% of the time, which is just slightly better than a coin flip.

Question #2: What is the likelihood this candidate will pass the technical interview?

Recruiters predicted the likelihood that each candidate would pass the technical interview. In most hiring processes, the technical interview follows the recruiter call and determines whether candidates proceed to the onsite. Being able to accurately predict which candidates will succeed at this stage is important and should inform the decision about whether to interview the candidate or not.

What we found most surprising is how far their predictions were from the truth:

  • When recruiters predicted the lowest probability of passing (0-5%), those candidates actually passed the technical interview with a 47% probability.
  • When recruiters predicted the highest probability of passing (95-100%), those candidates actually passed with a 64% probability.

Below is a graph that shows recruiter predictions vs. actual performance. The x-axis is the bucketed recruiter rating. In other words, the first point is all the candidates that recruiters assigned a 0-5% likelihood of passing. The y-axis is the average interviewing.io pass rate for those candidates. The red dotted line represents 100% accuracy – in an ideal world, the higher a recruiter's ranking of a candidate, the higher their actual performance would be. The orange line represents reality – as you can see, there isn’t much correspondence between how recruiters predicted candidates would perform and their actual performance.

recruiter predictions vs interviewing.io performance

Recruiters’ predictions below 40% underestimate these candidates by an average of 23 percentage points. Above 60%, they’re overestimating by an average of 20 percentage points. If this was predicting student performance, recruiters would be off by two full letter grades.

Recruiters can’t agree on what a good candidate looks like

Clearly, there is lots of noise in resume evaluations. Were recruiters’ noisy judgments at least consistent when reviewing the same resumes?

Nearly 500 resumes were evaluated by more than one recruiter. Based on a random selection of two evaluations per resume, the overall likelihood of two recruiters agreeing to either interview or not interview a given candidate was 64%.

Since recruiters also guess the probability a candidate will pass the technical interview, we can compare how different these guesses are for a given candidate. The average differential between two randomly selected recruiters’ evaluations of the same resume was 41 percentage points. So, let’s say one recruiter predicts a 30% probability the candidate would pass; another recruiter evaluating the same resume would predict, on average, a 71% probability of passing.

To further understand just how prevalent the disagreement is, we looked at the standard deviations for across-candidate evaluations and same-candidate evaluations:

  • 0.34 across different candidates
  • 0.32 across the same candidates

So, when two recruiters are asked to judge the same candidate, their level of disagreement is nearly the same as if they evaluated two completely different candidates.

The most sought-after resume attributes

Despite the noise and variability in the study’s resume evaluations, there were some characteristics that recruiters consistently favored: experience at a top-tier tech3 company (FAANG or FAANG-adjacent) and URM (underrepresented minority) status (in tech, this means being Black or Hispanic).

Most predictive for Question #1 (whether a recruiter would want to interview that candidate) was experience at a top company — these candidates were 35% more likely to be picked. Black or Hispanic candidates are also associated with an increased likelihood a recruiter would interview a candidate — by 21%.4

With Question #2 (how likely the candidate was to pass a technical interview), having a top company on your resume is associated with a 21% increase in the likelihood that recruiters believe the candidate will pass the interview. Compared to the actual pass rates, recruiters’ predictions of FAANG candidates are generally accurate (average 4 percentage point overestimate).5 Unlike the presence of a top company, URM status didn't appear to influence recruiter decisions here.

How do recruiters’ stated reasons for rejecting candidates line up with actual rejection reasons?

So, we know what recruiters tend to favor, whether they’d admit to it or not: 1) FAANG/FAANG-adjacent experience and 2) URM status. But what’s even more interesting than why a recruiter would say yes is why they would say no.

When we asked recruiters to judge a resume, we also asked them WHY they made that decision.6 Below are recruiters’ stated reasons for rejecting candidates. As you can see, “missing skill” is the main reason by far, with “no top firm” a distant third.

Bar chart of recruiter's stated reasons for rejection

So, then, we wondered… How do recruiters’ stated reasons for rejecting candidates line up with reality? To figure that out, we analyzed the resumes that ended up in the rejected pile and looked at common traits.

Below is a graph of actual rejection reasons, based on our analysis. The main rejection reason isn’t “missing skill” — it’s “no top firm.” This is followed, somewhat surprisingly, but much less reliably (note the huge error bars), by having an MBA. “No top school” and having a Master’s degree come in at third and fourth. Note that these top four rejection reasons are all based on a candidate’s background, NOT their skill set.

Predictors of recruiter rejections

The y-axis is the coefficient from regressing rejection on that variable. So, a coefficient of Y for a given trait means that trait is associated with a Y*100% percentage point increase in the likelihood of being rejected.

Slowing down is associated with better decisions

Another key piece of this study is time. In hiring settings, recruiters make decisions quickly. Moving stacks of candidates through the funnel gives little room to second-guess or even wait before determining whether or not to give a candidate the opportunity to interview.

In our study, the median time spent on resume evaluations was just 31 seconds. Broken down further by Question #1 — whether or not the recruiter would interview them — the median time spent was:

  • 25 seconds for those advanced to a technical interview
  • 44 seconds for those placed in the reject pile
Distribution of time taken to evaluate candidates

Given the weight placed on single variables (e.g., experience at a top firm), how quickly recruiters make judgments isn’t surprising. But might they be more accurate if they slowed down? It turns out that spending more time on resume evaluations, notably >45 seconds, is associated with more accurate predictions — just spending 15 more seconds appears to increase accuracy by 34%.7 It could be that encouraging recruiters to slow down might result in more accurate resume screening.

Recruiter accuracy vs time taken

Can AI do better?

As a gaggle of technologists and data geeks, we tested whether algorithms could quiet the noise and inconsistencies in recruiters’ predictions.

We trained two local, off-the-rack machine-learning models.8

Just like human recruiters, the models were trained to predict which candidates would pass technical interviews. The training dataset was drawn from interviewing.io and included anonymized resume data (years of experience, whether they had worked at a top firm, and whether they had attended a top 10 school for either grad or undergrad), candidates’ race and gender, and interview outcomes.9

Despite the very limited types of data we input into both models, when presented with out-of-sample candidate profiles, both models made predictions more accurately than human recruiters.

Random Forest was somewhat more accurate than recruiters when predicting lower performing candidates. XGBoost, however, was more accurate across the board than both the Random Forest model AND recruiters.

predictions vs interviewing.io performance

Where does this leave us?

In this section, when we say “we,” we are speaking as interviewing.io, not as the researchers involved in this study. Just FYI.

Advice for candidates

At interviewing.io, we routinely get requests from our users to add resume review to our list of offerings. So far, we have declined to build it. Why? Because we suspected that recruiters, regardless of what they say publicly, primarily hunt for name brands on your resume. Therefore, highlighting your skills or acquiring new skills is unlikely to make a big difference in your outcomes.

We are sad to see the numbers back up our intuition that it mostly is about brands.10 As such, here’s an actionable piece of advice: maintain a healthy skepticism when recruiters advise you to grow your skill set. Acquiring new skills will very likely make you a better engineer. But it will very likely NOT increase your marketability.

If enhancing your skill set won’t help, what can you do to get in front of companies? We’re in the midst of a brutal market, the likes of which we haven’t seen since the dot-com crash in 2000. According to anecdotes shared in our Discord community, even engineering managers from FAANGs are getting something like a 10% response rate when they apply to companies online. If that’s true, what chance do the rest of us have?

We strongly encourage anyone looking for work in this market, especially if you come from a non-traditional background, to stop spending energy on applying online, full stop. Instead, reach out to hiring managers. The numbers will be on your side there, as relatively few candidates are targeting hiring managers directly. We plan to write a full blog post on how to do this kind of outreach well, but this CliffsNotes version will get you started:

  • Get a LinkedIn Sales Navigator account
  • Make a target list of hiring managers at the companies you’re interested in
  • Figure out their emails (you can use a tool like RocketReach), and send them something short and personalized. Do not use LinkedIn. The same way that you don’t live in LinkedIn, eng managers don’t either. Talk about the most impressive thing you’ve built. Ask them about their work, if you can find a blog post they’ve written or a project they’ve worked on publicly. Tie those two things together, and you’ll see a much higher response rate. Writing these personalized emails takes time, of course, but in this market, it’s what you need to do to stand out.

Advice for recruiters

We know that recruiting is a tough job, especially in the current climate, where there are more applicants than ever and fewer recruiters to parse through them. So, it rationally makes sense to us that a recruiter would spend no more than 30 seconds per resume and focus primarily on looking for top brands.

We hope, though, that this piece may have given a measure of pause about your approach, and we’d like to leave you with two actionable pieces of advice. First, if you do nothing else, please slow down. As you saw above, taking just 15 extra seconds to read a resume could improve your accuracy by 34%.11

Our second piece of advice is this. Freada Kapor Klein from Kapor Capital coined the term “distance traveled” more than two decades ago. It refers to what someone accomplished, in the context of where they started. For instance, Kapor Klein recommends that, in their admissions processes, universities should consider not just the number of AP tests a candidate has passed but the number of AP tests divided by the total number offered at their high school. For example, if an applicant took 5 AP tests and their school offered 27, that paints a very different picture from another applicant who also took 5 AP tests when that’s the total number offered at their school. Kapor Capital uses distance traveled as one of their metrics for determining which entrepreneurs to fund. One can easily apply this concept to hiring as well.

Take a look at the resume below. "John" (name has been changed; scrubbed resume shared with permission) studied chemical engineering and worked his way into software engineering by starting as a service engineer focused on pen testing. In the meantime, he completed a bootcamp, attended the Bradfield School of Computer Science (a school dedicated to teaching computer science at a depth beyond what many university programs, and certainly most bootcamps, offer), and ended up with a senior title in just three years.

John was consistently rated poorly by recruiters but is one of the top performers on interviewing.io.

anonymized resume

It takes just a bit more time, so please spend a little longer reading resumes, and evaluate candidates’ achievements in the context of where they came from. Think about the denominator. But don’t think for a moment that we recommend that you lower the bar — absolutely not. On interviewing.io, we regularly see candidates like John objectively outperforming their FAANG counterparts.

What this means for our industry

The last time I did this research, I wrote about how being bad at judging resumes isn’t anything to be ashamed about and that comes down to the resume itself being a low-signal and not-very-useful document.

I held that same opinion for the last decade (and even wrote a recent post about how AI can’t do recruiting)… right up until we ran this study and successfully built two ML models that outperformed recruiters.

So, I stand corrected.

As you saw above, both models were limited – they were looking at the same types of features that recruiters do when they quickly scan a resume, certainly fewer features than recruiters have access to. But, despite that, the AI models still outperformed humans. What happens then, if you can build a model that behaves like a recruiter who really slows down and reads everything? These results make me believe that resumes do carry some signal, and you can uncover it if you carefully read what people write about their jobs and themselves and also analyze how they write it. Unfortunately, this takes more time and effort to uncover than most human recruiters are able to devote. And, in retrospect, that’s a good task for AI. Though we haven’t built a model like that for this post, I’m optimistic that we may be able to do it in the future.

As I said in the AI piece I linked above, in order for AI to do useful recruiting work, rather than just perpetuating the biases that human recruiters hold, it needs a data set that contains some objective measure of performance. Most recruiting AI models today do one of three things: glorified keyword matching, training on what recruiters prefer (the outcome is whether a recruiter would want to talk to the candidate, NOT whether the candidate is good), or live on top of existing tools like ChatGPT (which we recently showed doesn’t perform very well and is biased against non-traditional candidates). These three approaches just result in the wrong thing being done, faster.

I hope that, in the not too distant future, we can use AI to make less-biased decisions, using meaningful performance data. And I hope that this type of AI solution can get adoption among the recruiting community.

Footnotes:

Footnotes

  1. Participating technical recruiters were paid a base rate and then received additional $$ for each accurate prediction.

  2. Different roles have different requirements. To correct for that, we asked each candidate to specify which eng role they were applying for: Software Engineer (back-end or full-stack), Mobile Engineer, Front-end Engineer, ML Engineer, Data Engineer, or Engineering Manager. Then we prompted recruiters to evaluate them specifically for that role. If no role was specified by the candidate, the default role to evaluate for was Software Engineer (back-end or full-stack).

  3. Top firms = Airbnb, Amazon, Anthropic, AWS, Apple, Asana, Atlassian, Bloomberg LP, Checkr, Coinbase, Coursera, Cruise, Dropbox, Etsy, Facebook, Flexport, GitHub, Google, Gusto, HashiCorp, Instacart, Instagram, Jane Street, Jump Trading, Khan Academy, LinkedIn, Lyft, Medium, Microsoft, Mozilla, Netflix, Oculus, OpenAI, Palantir, Peloton, Pinterest, Postmates, Quora, Reddit, Robinhood, Roblox, Salesforce, Segment, Slack, Snap, Snowflake, SpaceX, Spotify, Square, Stripe, Tesla, Thumbtack, TikTok, Twilio, Twitch, Twitter, Two Sigma, Uber, Udemy, Waymo, Whatsapp, Yelp, and Zoom.

  4. We corrected by FAANG & FAANG-adjacent experience (and all of our other variables) before making this statement, i.e., the effect existed for engineers from underrepresented backgrounds who did not have FAANG/FAANG-adjacent companies on their resumes. We expect that recruiters favor underrepresented minority candidates because of guidelines from their employers to focus on sourcing these types of candidates, as part of DEI initiatives. Discussion about the magnitude of this effect and its implications is out of scope of this piece.

  5. Interestingly, recruiters might penalize, for example, alternative education. Candidates with only alternative education pathways post-high school — coding bootcamps or digital certifications — appeared to be penalized by recruiters in this study. However, with limited observations (n=11), it’s inconclusive without further study.

  6. That field was optional, so most of the reasons recruiters provided were in cases when they said no — presumably because the reasons for saying yes may have seemed self-evident.

  7. It’s not that recruiters who generally take their time make more accurate judgements. Any recruiter slowing down might make them better at judging resumes!

  8. It’s important to stress that neither algorithm was custom-built. The models, one using a Random Forest algorithm and the other an XGBoost algorithm, are distinct but interrelated approaches akin to Decision Tree algorithms. Decision trees sort data into groups based on features. Random forest algorithms combine multiple decision trees to improve predictions. XGBoost builds multiple decision trees one after another, with each new tree focusing on prediction errors from the previous trees.

  9. Training data excluded data in this study. We take user privacy very seriously, and we want to stress that all models were local and anonymized and that no data in this study was shared with cloud LLMs.

  10. To see a particularly egregious example of recruiters favoring brands over substance, take a close look at this fake resume that got a bunch of recruiter responses. And this one too.

  11. We haven’t proven causality here, but when we just scoped our analysis to the same person, it appeared that taking more time did help (in other words, it’s not just that recruiters who spend more time usually are more accurate; it’s the added time). Still, this is something that merits more work, and we'll try to investigate it causally in the future.

We know exactly what to do and say to get the company, title, and salary you want.

Interview prep and job hunting are chaos and pain. We can help. Really.