22 May 2026·5 min read

Pilot to production: what UK accountancy can learn from regulated insurance

On Thursday I watched a webinar on AI leadership in regulated financial services, framed for insurance but with the same questions about to land in UK accountancy. A few of the lines from the panel stayed with me, and what they mean for accountancy practices buying AI tools right now.

On Thursday I watched a webinar called "Pilot to Production: AI Leadership in Regulated Financial Services," produced by Computing and sponsored by EPAM and Microsoft. The panel was Mik Quinlan from EPAM, Stephen Holdstock who was formerly CTO of Lloyd's of London and is now at EPAM, and Naveen Dhar from Microsoft. Penny Horwood moderated.

The panel was framed for regulated FS, mostly insurance, but the same questions being asked in that conversation are about to land in UK accountancy. Most practices are not ready for them yet, and I do not think most of the vendors selling into those practices are ready either.

A few of the lines that came up during the hour stayed with me, and I want to set out what they mean for UK accountancy practices buying AI tools right now.

What the panel actually said

Stephen Holdstock made a point near the start that I noted down twice. "Demo is fundamentally different to production." A vendor can build something that runs cleanly on five test cases, and that is genuinely impressive in the room, but putting the same tool in front of a regulated decision maker, at scale, with regulators watching, is a completely different engineering problem. Many firms across regulated FS are still working at the demo end of that spectrum while talking publicly as if they are operating at the production end.

Naveen Dhar from Microsoft pushed back on the impulse to overbuild. "One workflow in production beats six months of building." The temptation in any AI conversation is to design the perfect strategy across every workflow at once and then announce it. The advice from the panel was to instead get one regulated workflow live, observed, and governed, and then move to the next one from a position of having actually proved it works.

Naveen made another point that has stayed with me longer than the others. "AI explores the gaps, rather than hiding them." A good AI tool surfaces the data quality problems that a firm has been politely ignoring for years, sometimes a decade. A bad AI tool quietly hides those problems and tells the user that everything is fine.

Mik Quinlan was more direct on the engineering side. "Throw the vibe code away when moving to global scale." Code written quickly to prove a concept is not the same code that survives in production, and the discipline shift between the two is where most projects quietly die. He was talking about insurance platforms but the same is true of any AI tool that is sold as a finished product when it is really a clever prototype.

Naveen closed his contribution on the outlook with what he called the Three Ps for who scales: Performance, People, and Prevention. The biggest gain from putting AI into regulated work tends to be the loss it prevents, not the speed it adds, and that is a harder thing to measure but a more important one to count.

The bridge to UK accountancy practices

The webinar was framed for insurance, but the lesson is identical for UK accountancy.

The question every accountancy practice should be asking the AI vendor sitting across the table from them right now is not "what can your tool do?" It is "can you show me what happens when the tool gets it wrong?" That is the difference between a demo and a production system, and it is the question that most vendor pitches do not really survive.

A demo proves that the tool sometimes works. A production system proves that the tool fails safely when it does not. Vendors that stopped their work at the demo cannot really answer the second question with a straight face. Vendors building for production can, and they will usually want to.

If the tool a practice is being sold is built on a generic AI chat interface, with no audit trail and no citations and no owner attribution on the outputs, then what is being sold is a demo dressed up. If the tool has citations on every output, audit trails on every workflow, owner attribution on every decision, and a clear path for the practice to override the result when needed, then it is starting to look like a production system that a regulated practice can actually use.

The deciding factor over the next two years is not really going to be the brand of AI bought, or the price per seat. It will be whether the practice has the governance discipline in place to actually use the tool well, and whether the vendor in front of them has built something that supports that discipline rather than gets in its way.

What this looks like in practice

A practice that has the governance discipline in place can answer four questions cleanly without much hesitation:

Which workflows is AI allowed to touch, and which is it not.
Who reviews each AI output before it leaves the practice.
What citation or evidence does every AI output carry.
When the AI is wrong, how is that recorded, escalated, and used to improve the next decision.

If a practice cannot answer those four questions, an AI tool will end up as "expensive theatre" regardless of how good the demo looked at the start. If a practice can answer them, even partly, the AI tool starts compounding value every week it runs and every workflow it touches, which is what the vendors selling into UK accountancy should really be helping their clients build toward.

Most of the vendors selling into UK accountancy today cannot help a practice answer those four questions, and that is the part worth watching over the next twelve months or so. The ones that can are doing something genuinely different, and they are the ones I would be paying attention to.

Lexendo

Find the risk across your client portfolio — before HMRC does.

Six domains of UK tax and compliance coverage. 198 HMRC sources. 265 tribunal decisions. 30-day money-back guarantee.

Get started →

If Lexendo doesn't find a risk worth acting on in your first 30 days — full refund.