Built a Digital Twin of my Manager
There's a specific kind of meeting most professionals dread: the one where you present a piece of work to your manager, watch them read it in silence, and then field a list of observations that, in retrospect, were entirely predictable. The work wasn't wrong. It just wasn't calibrated to how they think.
I decided to solve that problem with a digital twin.
What I Built
Over several weeks, I collected a corpus of real communications - emails, decisions, reviews, and responses - from a senior leader. From that corpus, I distilled two things: an identity file capturing how they communicate, what they prioritise, and how they make decisions; and a set of domain knowledge files covering every major topic area relevant to our work.
Each domain file was structured around what I call CALIBRATION ANCHORS - the three or four positions the person has stated most explicitly and most consequentially on a given topic. Not paraphrases. Near-verbatim quotes, with context, so the model understands not just what they said but why they said it and what it means for adjacent decisions.
I then loaded these files into an AI assistant configured to simulate my manager's perspective - in their voice, applying their heuristics, flagging what they would push back on.
The system has four output modes: a simulated response, a review of a draft, a likely decision on a question, or a senior-level brief. Every output carries a confidence label - distinguishing between positions drawn directly from the corpus, positions inferred from observable patterns, and positions derived from the regulatory or policy frameworks they operate within. The difference matters.
How I Tested It
Before deploying the tool operationally, I designed 20 validation test cases - two per domain area - each targeting a specific signal I knew should surface if the calibration was working. Not broad themes. Specific: a named person who should draft a particular response, a regulatory provision that should be cited verbatim, a distinction between two perimeters that would otherwise be conflated.
I ran the tests. Several failed. I diagnosed why - the key signals were buried in long flat lists, treated as equal-weight by the model. I restructured the files, promoting the highest-stakes positions to the top. I ran the tests again. The signals surfaced.
What It Can Do Now
Here is what actually happened in one test case. I asked the tool to advise on how to respond to a regulatory question about a process gap. Without the domain files loaded, it produced a fluent, plausible, three-bullet explanation - confident, coherent, and wrong. The real answer was simpler: a specific error, a specific person who should phrase the response, and a correction already in progress. With the domain files loaded and the anchors active, the tool produced exactly that - routing, root cause, and posture - without invention.
That is the iteration I did not have to take to my manager.
The tool does not replace judgment. It sharpens the work before judgment is applied. I arrive at the conversation with fewer predictable gaps, and my manager spends less time re-explaining positions they have already stated.
That, I think, is what calibration is for.