The Flaws of AI-Powered Resume Scoring i…

Remember when the first AI-powered resume screeners were sold as a way to remove human bias from the hiring process?

The report from danunparsed.com highlights a hilarious inconsistency: a single resume scoring 90, then 74, then 88. This isn’t a bug; it’s the nature of the beast. When you wrap an LLM in a prompt and tell it to act as a recruiter, you aren’t getting a deterministic calculation. You are getting a vibe check performed by a statistical probability engine. The variance suggests that the “scoring” is essentially a coin flip based on how the model decided to weight a specific bullet point in a specific millisecond.

It is like trying to judge a chef by the weight of their salt shaker. You might get a number, but that number has almost zero correlation with whether the food actually tastes good. If the score for a static PDF can swing by 16 points in a few iterations, the number itself is a hallucination. (And probably a confident one at that). It suggests that the internal logic is far more fragile than the marketing suggests. One slight shift in the tokenization of a job title or a weirdly formatted date might be enough to tank a score.

Absolutely. That is the only reason to open source this in the first place. Once the weights and the prompts are public, the arms race moves from “writing a good resume” to “optimizing for the prompt.” We’ve seen this with SEO for twenty years; now we’re just applying it to the job application. Candidates will start embedding invisible keywords or structuring their experience in the exact semantic patterns the ATS rewards. We’ve already seen the “white text” trick where people hide keywords in white font to fool basic scanners; this just upgrades the trick to a semantic level.

Who actually believes a number from 1 to 100 describes a human’s ability to write a concurrent system in Rust? The moment the “secret sauce” is public, the sauce becomes useless. By Q3, the market will be flooded with “perfect score” resume generators that make this entire scoring system a dead letter. You’ll have a thousand candidates all scoring 98/100, and the recruiter will be right back where they started—staring at a pile of PDFs and wondering who actually knows how to code.

Hardly. The problem with ATS scoring isn’t that the algorithms are secret—it’s that recruiters actually trust them. Most hiring managers don’t look at the resume until the machine has already filtered out 90% of the pool. If the machine is fluctuating between 74 and 90, the candidate is either “in” or “out” based on a roll of the dice. Open sourcing the code doesn’t fix the fundamental laziness of the hiring pipeline. It just documents the failure.

It shifts the friction. Instead of wondering why they didn’t get an interview, candidates can now spend their weekends tweaking their “Skills” section to move from an 82 to an 89. It turns the job search into a game of Tetris where the blocks are your life experiences and the goal is to satisfy a prompt written by a product manager. Do we really want a world where the primary skill for getting a job is knowing how to please a specific version of a prompt?

It is a vanity project masquerading as transparency.

It depends on what you mean by “open.” Releasing the logic for a scoring system is a nice gesture, but it doesn’t change the power dynamic. The companies using the tool still hold the keys to the kingdom. The developer doesn’t get to decide how they are scored; they just get to see the broken mirror that’s reflecting their professional worth back at them.

If HackerRank wanted to actually help developers, they would have built a system that ignores the resume entirely and focuses on verifiable proofs of work. Instead, they’ve just given us a peek behind the curtain to see that the wizard is actually just a prompt that occasionally forgets how to count. Or maybe the variance is an intentional feature to keep candidates guessing—I’m not convinced it’s an accident. Either way, it’s a reminder that we are trusting our careers to a glorified autocomplete.

Related coverage

Respond.io Raises $62.5M to Scale AI Agent-Powered Messaging

Anthropic's Claude Science and the Shift Toward Vertical AI Models

The Coworker Delusion: Why AI Agents Are Not Professional Peers

The Cost of Replacing Institutional Memory with AI in Manufacturing