A few months ago I started what I thought was a simple exercise — I wanted to understand, honestly, what it actually costs to build software with AI assistance versus the traditional route.
Not the vendor marketing version. The real version. The one that accounts for my time, the API bills, and a genuine like-for-like comparison with what it would cost to hire someone to do the same work.
What I didn’t expect was for the experiment to turn into a production tool that solves a real financial problem my organisation has been living with for years.
The problem worth solving
Some context helps here, because it matters for the numbers later.
I work in IT for a complex, multinational technology group. IT operates as a central shared service — which means a significant budget, the majority of which gets recharged back to subsidiary entities across multiple markets. Think software licences, cloud consumption, managed services: all of it has to be allocated fairly and accurately to the businesses that use it.
The budget involved is material. The recharge process, however, was entirely manual — monthly spreadsheet assembly, data pulled by hand from multiple systems that don’t naturally talk to each other, cross-referenced against headcount figures that may or may not be current, checked and rechecked by a small team who are the only people in the organisation who fully understand how it works. Attempts to hand the process over to other team members had failed. The knowledge had become dangerously concentrated.
The problem wasn’t that nobody knew it was broken. It’s that building something better always felt like a project — something that needed a budget sign-off, a development team, a timeline. Too big to just do. So it stayed a spreadsheet.
This is the category of problem I think AI-assisted development is most interesting for. Not greenfield products. Not innovation theatre. The operational tools that exist in almost every organisation, that everyone knows should be better, and that never quite make it to the top of the priority list.
The thought experiment
The setup was straightforward. I used Claude Code (Anthropic’s AI coding assistant) to build a tool to automate the recharge process — ingesting consumption data automatically, allocating costs against real usage and current headcount, and producing defensible, auditable outputs. I kept a rough log of my time. When the tool reached a point where I could genuinely call it production-ready, I asked Claude itself to estimate how long a full-time senior developer would have taken to build the same thing.
The answer: 8–12 weeks full-time.
My actual time investment: approximately 0.6 person-weeks — roughly 20% of my working time across about a month.
That’s a 12× effort magnification on the conservative end. Possibly 20–25× on the upper estimate.
Now the cost comparison. Using a £80K UK full-stack developer salary as the benchmark — roughly $100K USD — 2–3 months of that person’s time costs somewhere between $15K–$25K. My total spend was around $2,500: roughly $1,500 in API tokens plus the honest cost of my own time at the same salary benchmark.
That puts the cost reduction at roughly 8–10× versus a contractor or new hire. Not the inflated headline number you see in AI marketing material — the real one, with my time counted honestly.
The moment the art of the possible opened up
Before a single line of code was written, I spent time in conversation with Claude just describing the problem — roughly, informally, the way you’d explain it to a smart colleague over coffee. I wasn’t precise. I didn’t have a requirements document. I had a vague sense of what I needed and a lot of domain knowledge I hadn’t yet figured out how to articulate.
What came back surprised me. Claude didn’t just reflect my brief back at me. It started suggesting capabilities I hadn’t asked for — edge cases I hadn’t considered, architectural approaches I wouldn’t have known to reach for, and features that, once named, were obviously right. The kind of things an experienced developer or business analyst might surface after weeks of discovery workshops. Surfaced in an afternoon, through what I can only describe as genuinely exploratory conversation.
I realised fairly quickly that I wasn’t just prompting a tool. I was doing something closer to collaborative scoping with a domain-aware thinking partner — one that happened to then go away and build what we’d designed together. That shift in mental model changed how I used it for the rest of the project, and I think it’s the part of the AI development story that gets least airtime. Everyone talks about speed. Not enough people talk about what happens to the quality of your thinking when you have something that pushes back, extends your ideas, and fills in the gaps you didn’t know you had.
Where it got interesting
Here’s what I didn’t anticipate. Once the tool was running against real data, it started surfacing things that had been invisible — not because anyone was hiding them, but because nobody had ever had a system that could see across all the relevant data sources simultaneously.
Two findings stood out immediately:
Around 750 user accounts with no cost centre mapping. In a recharge model, unmapped users are effectively cost that goes nowhere — either absorbed silently or corrected manually each month in a spreadsheet adjustment that never makes it back to the source system. In our case, this represented potentially ~$45K per month of exposure. The figure is almost certainly being managed in some form already. But “managed manually, invisibly, by one person who knows where to look” is not the same as “fixed.”
Around 190 duplicate licence seats — the same user provisioned twice, licences retained after role changes, inherited accounts from system migrations. ~$50K per year in direct unnecessary spend. No recharge model changes required to recover it. Just remediation in the source systems.
Neither of these was a surprise in the abstract. In any large, complex organisation with multiple subsidiaries and years of accumulated system history, you’d expect to find something like this. What was striking was how quickly it became visible the moment a tool existed that could actually look.
This is, I think, the underappreciated story about AI-assisted development. It’s not just that you can build things faster. It’s that you can now afford to build the right things — the operational tools that have always been “too small to commission properly” but sit on top of surprisingly large inefficiencies.
What I actually learned
A few things stood out that I haven’t seen discussed much:
The economics are real but not magic. 8–10× cost reduction is genuinely significant, but it requires someone with enough domain knowledge to direct the work. I wasn’t a passive observer — I was the product manager, tester, and occasionally the debugger. The AI did the heavy lifting on code, but the thinking was mine. That matters for how you frame it internally.
The discovery process is underrated. The planning conversations — before any code exists — turned out to be as valuable as the build itself. My initial prompting was, by any expert measure, pretty rough. What came back was a scope that was materially better than what I’d walked in with. That kind of low-friction, high-quality scoping conversation is something most organisations pay consultants significant money for.
The handover problem is real. One of the reasons I built this tool at all was that the manual process it replaces had become almost impossible to hand over. Two team members had tried and stepped back. That kind of organisational single-point-of-failure is expensive and fragile, and it accumulates quietly until someone leaves — or until something goes wrong with the month-end close.
Unstructured data hides costs. The $45K/month figure wasn’t sitting in a report somewhere. It required joining up data sources that had never spoken to each other. The moment you do that, you realise how much routine financial hygiene simply doesn’t get done — not through negligence, but because the tooling to do it properly was always just slightly out of reach.
The build vs. buy question has shifted. The classic objection to internal tooling is that it takes too long and costs too much, usually resolved in favour of buying something off the shelf. That calculation is starting to look different. Not for everything — but for the category of “we know this problem exists, we roughly know what the solution looks like, we just never had the bandwidth to build it” — that category is much more accessible now.
I’m still thinking through what this means for how technology teams should be structuring their work. But the experiment taught me something I didn’t fully expect: the biggest unlock isn’t raw speed. It’s that AI assistance moves the economic threshold for what’s worth building.
That’s a meaningful shift. And the $45K/month sitting quietly in an unmapped spreadsheet is probably the clearest illustration I have of why.
Interested in how others are thinking about this — particularly around the internal tooling question and how you’re measuring real vs. perceived AI development costs. I’d love to hear from people who’ve run similar experiments.
Leave a Reply