What should senior engineering interviews measure when coding is agent-assisted? • David Beveridge

The more normal AI-assisted coding becomes, the stranger a lot of senior engineering interview loops start to look. I don't mean coding no longer matters. I mean the thing many companies are actually trying to de-risk at senior levels was never just isolated keystrokes in the first place.

This started as a practical question. I was thinking through the shape of senior engineering roles and the interview loops that usually come with them, and I kept getting stuck on the same mismatch. A lot of companies say they want people who can make good technical decisions under ambiguity, improve system shape over time, work effectively with product partners, and lead execution without losing the plot. Then they evaluate those candidates with a format that mostly asks: how fast can you write code alone, in an artificial environment, with your normal tools taken away?

I understand why that format became common. It is standardized. It feels objective. It reduces the chance that someone talks their way through a loop with no implementation ability underneath. But I think the pressure from AI tools is exposing a deeper issue: for many senior roles, that interview proxy was already a weak fit for the job. Tool-assisted coding just makes the mismatch harder to ignore.

The question is not whether coding still matters

I don't think the right response is, "Great, now nobody needs to know how software works." That would be absurd. Senior engineers still need to understand systems, data flow, interfaces, failure modes, performance tradeoffs, and what good implementation smells like. They still need to reason precisely enough to notice when an LLM-generated answer is subtly wrong, over-scoped, unsafe, or based on a false assumption.

So this is not an argument against technical depth. It is an argument about signal quality.

If a company is hiring a senior engineer, staff engineer, principal engineer, or product-platform leader, what is it really trying to learn? Usually some mix of these:

Can this person figure out what problem actually matters?

Can they reason about a messy system instead of a toy one?

Can they choose a good slice, not just a complete one?

Can they use modern tools without outsourcing accountability to them?

Can they explain tradeoffs clearly enough that other people can make decisions with them?

Can they improve the quality of motion around a team, not just produce local output?

None of those questions are well answered by pretending the candidate is locked in a blank room with a whiteboard and an arbitrary timer.

Tool use changes the failure mode

What AI changes is not simply speed. It changes where judgment has to show up.

A mediocre engineer with an LLM can produce a lot of plausible-looking motion very quickly. A strong engineer with an LLM can compress busywork, test options faster, inspect more surface area, and stay focused on the real constraint. The difference is not "who used a tool" versus "who did not." The difference is who framed the problem better, who reviewed the output harder, and who stayed responsible for whether the answer actually fit reality.

That is why I think a senior interview that bans all tools can accidentally test the wrong thing. It may reward the candidate who is most practiced at a now-artificial performance format, not the candidate who is best at using modern leverage without getting sloppy, credulous, or detached from the system.

In other words: if AI is in the workflow now, then one legitimate hiring signal is whether someone can delegate to a tool without surrendering rigor. That is not a soft skill. It is emerging engineering hygiene.

The old interview proxy and the better one are not the same

If I were redesigning senior interviews, I would make the target signal explicit and then choose an exercise that actually exposes it.

What the company wants to learn	Weak proxy	Better exercise
Can this person reason about a real system?	Timed algorithm puzzle with no context.	Give them an unfamiliar but realistic code path, bug report, or architecture sketch and ask how they would diagnose it.
Can they make good tradeoffs under constraints?	Ask for a full implementation of a narrowed toy problem.	Ask them to choose scope, identify risks, and explain what they would cut, sequence, or instrument first.
Can they use AI tools without getting careless?	Ban tools completely and assume that proves rigor.	Allow normal tools, including AI assistance, and ask the candidate to narrate what they accepted, rejected, verified, or rewrote.
Can they improve team-level decision quality?	Judge them only on solo implementation speed.	Run a design review, incident review, or roadmap tradeoff discussion and see whether the system gets clearer around them.

That table is the heart of the issue for me. The question is not whether coding matters. It is whether the interview exercise matches the kind of value the role is actually supposed to produce.

If I were building the loop from scratch

I would probably use a four-part structure for many senior technical roles.

First, a compact screen on experience and problem shape. Not a résumé recital. A conversation about what kinds of systems the candidate has actually influenced, where they changed the model instead of merely following it, and what kinds of constraints they seem to understand from lived experience.

Second, a diagnosis round. Show them a concrete system artifact: a bug report, partial code path, architecture diagram, failed migration plan, or product-platform tension. Ask how they would inspect it, where they would be suspicious first, and what they would want to learn before acting.

Third, a build-or-debug round that uses reality-shaped conditions. Let them use the tools they would normally use. That can include documentation, a local editor, tests, and even AI assistance. The catch is that the candidate has to narrate their reasoning, show what they trust or distrust, and demonstrate verification discipline. The point is not that they got code from a model. The point is whether they can turn tool output into a reliable engineering result.

Fourth, a judgment and communication round. Give them an ambiguous product or platform tradeoff and see whether they can make the system more understandable for other people. A lot of senior engineering value lives there. Good leaders and staff-plus engineers create clarity. The interview should be able to notice that.

I would still keep a coding bar

I would not remove the coding bar entirely, because that creates a different failure mode. Some people really can sound strategic while staying conveniently abstract. A good senior loop should still expose whether the candidate can reason precisely enough to change a real system safely.

But I would make the bar more honest.

If the role is mostly backend product work, test backend product work. If the role is platform-heavy, test the kind of diagnosis, interface reasoning, and boundary decisions that platform work actually requires. If the role requires strong implementation fluency, then include implementation. Just stop pretending a stripped-down puzzle is always the highest-fidelity signal.

I would also distinguish between role levels more carefully. A junior engineer might reasonably be tested more directly on implementation fundamentals because the role itself is closer to raw execution. A principal candidate is usually being hired for a broader surface: framing, tradeoffs, risk, system shape, and the ability to move a messy environment toward coherence. The loop should shift with the job.

The hidden question is whether the company knows what it values

This is the part I keep coming back to. Interview design is not only a candidate filter. It is also a company self-description.

If a company says it wants people who can operate with judgment in modern engineering environments, but its loop still optimizes for isolated performance under tool bans, that may indicate one of two things. Either the company has not updated its model of the work yet, or it does not fully trust itself to evaluate the higher-order signals it claims to care about. Neither is fatal. Both are worth noticing.

There is a real cost to this confusion. Companies can end up filtering out the people who are best at shaping systems under current conditions and over-selecting for people who are best at a rehearsed interview genre. Candidates feel it too. A lot of experienced engineers are not actually resisting rigor. They are resisting an evaluation format that no longer resembles the work closely enough to feel intellectually honest.

What I think the interview should reward now

If I compress all of this down, I think senior technical interviews should increasingly reward five things: framing, diagnosis, taste, verification, and communication. Implementation still matters, but it belongs inside that larger picture now.

Can the candidate identify the real problem instead of the nearest one?

Can they interrogate tool output instead of admiring it?

Can they decide what good enough means in context?

Can they reduce ambiguity for other people without pretending certainty they do not have?

Can they leave the system clearer than they found it?

I don't think this fully settles the question. Some teams will still need stronger direct implementation evidence than others. Some candidates really do need a sharper coding screen. And there is always a risk that a more realistic loop becomes a more subjective one if the company is sloppy about scorecards.

Still, my current read is that AI-assisted coding has made one thing much clearer: the real senior-level differentiator is not whether someone can produce keystrokes in isolation. It is whether they can use tools, judgment, and technical depth together without losing accountability for the outcome.