---
title: "Agents work. Coding shows how."
description: "How we can learn from coding agents when building AI systems for other parts of businesses."
ogImage: "/blog/newsletter-week8.jpg"
date: "2026-04-09"
tags: ["AI", "General"]
status: "Featured"
---

A lot of conversations about agents are actually still theoretical.

Which leaves people wondering \- do they work in real life, outside demos? Can you trust them with messy data and valuable work? Or are they just a great way to go viral by asking people to comment Claude for your killer agent setup (🤢)?

Agents in software development have already answered those questions.

Coding agents like Claude Code and Codex (OpenAI) are the best example we have of agentic systems doing real work in production. Not "look, it made a cool landing page that only exists on my laptop." More like: read this codebase, understand the goal, make changes without breaking everything, show your working, then let the human step in before anything ships. Anthropic developers are building Claude Code and Claude Cowork almost exclusively [using Claude Code itself,](https://x.com/slow_developer/status/2011015481394950146) moving at super high speed.

That is why a lot of my agent-building work takes inspiration from how tools like Claude Code work. They're ahead of the curve and we can take those lessons into other domains.

## A concrete example

At Christmas I used Claude Code to build a custom version of Wordle for my family to play in about an hour.

Almost everyone knows how Wordle works, but to build a new version myself, I needed to manage user logins to save results and guesses, an admin profile for adding new target words and adding some extra features like being the first to answer being given a bonus point.

If I'd built that manually, writing and testing every line myself (like I would have done only 2/3 years ago), it would have taken a couple of days. So I saved a huge amount of time (and the £5 it would have cost to create a custom version, without my custom colours and extra features, on the NY Times). But that's not the point.

The interesting bit was how I worked with Claude Code to create the game.

Because it was just a fun project, I didn't sit there writing a totally rigid plan and specification doc with every step mapped out in advance (as I do for my "work" work). I told Claude Code what I wanted and gave it access to the relevant files and documentation (some info on the database I wanted to use). Then I let it go crazy and gave feedback at regular points when I could see either the code or the design not working exactly as I wanted.

The first version was pretty good. The game worked and I could log in as the admin to set the target word for a day.

But it also missed several things that I wanted to include, like the custom look and feel, bonus scoring and some analysis tools so the admin could prepare a funny summary for the family when the game finished. I also added some more security features to lock down the game from any cheaters or people outside the family.

Those ideas and features didn't come from Claude Opus (the actual Large Language Model powering Claude Code). They didn't come from commenting to get "better prompting". They came from 10+ years experience coding and using Google Cloud databases.

And that's a really good illustration of how agents work in reality today.

The coding agent (Claude Code) was fast and good enough to produce a great version 1 quickly. I, as the human, was useful because I knew where the security gaps were, where some of the code looked a little funky, and what "good" actually was for my game.

Every time I added one of those corrections, the game got better (and we had a great couple of weeks before Christmas playing). Not because the model had a sudden breakthrough working independently, but because it had more and more context and a human in the loop who could spot the potential issues before they became part of the final output.

## Why coding agents show the patterns to follow

Coding agents are basically a live example of all the points I've been writing about.

### 1\. They are agentic

When I use a coding agent, I'm not telling it *exactly* how to write every line of code (or I might as well do it).

I am saying something closer to:

"Complete this small, specific task, with this goal, as part of this wider approach."

The agent then works out the exact steps. Which files to inspect. Which docs to read. What code to write first. What to test. When to raise to me that more info. is needed or there's a potential issue.

That is much closer to handing a person a brief than wiring up a non-flexible Zapier chain.

### 2\. Context is what makes the output useful

Coding agents work well because they are not operating in a vacuum.

They can read the existing codebase. They can fetch the latest documentation. They can follow established patterns in the project (having really solid foundations written by me has been the biggest difference maker I've found when building [Praxis](https://www.praxis-agents.ai) with coding agents). They can look at comments and tests and naming conventions and previous decisions.

That is already a much richer context than most business tools have.

Then you can add domain context on top. In my case, that means everything I've learned about working with APIs for platforms like Google Ads & Meta over the last 10 years \- how the data behaves in the real world, which quirks matter, which edge cases are worth handling, what kinds of recommendations are actually useful rather than just technically defensible.

Someone working with the same agent and underlying large language model would get a very different output.

### 3\. Human review is built into the workflow

This is the bit other industries should really be paying attention to.

The best coding-agent workflows do not hide the process and then reveal a final answer. They show the process while the work is happening. You can see the files being read. You can see the changes being made. You can see the tests being run.

And crucially, you can hit Esc and stop the agent to give new instructions while it's working.

If the agent starts going down the wrong path or getting stuck somewhere, you can stop it early. You don't have to wait until the whole thing is "finished" and then try to reverse-engineer exactly where and when it went wrong (unlike working through a super complex n8n chain).

That is a far better model for trust and accurate results than the black-box setup most non-coding AI tools still push people towards.

### 4\. Expertise is the multiplier

Similarly to the point about context \- multiplying human expertise with agents is *far* more effective than trying to replace humans with AI models.

Give the same coding agent to a senior developer who knows the codebase, the domain, the trade-offs and the likely failure modes, and they will get dramatically more value from it than someone who is new to the problem, or someone trying to "vibe code" a new platform with limited knowledge of the technology they're using.

That is not because they're better at prompting (or got a prompt from some supposed LinkedIn guru). It is because they already know what to ask for, what context matters, and what to reject.

I think the exact same thing will be true in every other domain that adopts agents seriously \- enabling and scaling human expertise rather than replacing human expertise.

## Moving from coding to other businesses

If you change the wording of some of the points above, the structure is basically identical (taking advertising as an example):

| Coding world | Advertising world |
| :---: | :---: |
| Codebase | Client accounts and historical data |
| API docs and technical docs | Campaign briefs, platform guides, internal best practice training |
| Code review before merge | Campaign QA review before go-live |
| Senior developer | Experienced account manager or strategist |
| Launch to production | Launch to client account |

Coding is the first domain where there is enough agent usage to identify these patterns. Most of the advances in AI and Agent tech has come from the West Coast of the US, where so much work is focused around coding and so that's where the focus of agents has been. But we can take the insights from how coding agents work and apply them elsewhere.

The thing you should be learning from software is **not** "everyone must become a programmer" or "everyone should learn Claude Code and use it in advertising”. It is that agentic process plus rich context plus visible human review is a very effective operating model.

Software just got there first. I want to bring that approach to all businesses.

## Speed vs Value

When people watch someone using Claude Code or Codex, they often focus on the wrong thing \- the speed.

Watching an agent write a useful chunk of code in minutes is impressive. But the speed is only useful if the output is actually valuable.

That's driven by the user. If they know what outcome they want, what quality looks like, when a neat-looking answer is actually a bad one and which constraints matter they can add a huge amount of context and value to the process.

Without any of that, agents can still produce a lot of output very quickly. But it's probably wrong and not useful in the slightest. I know I've received project briefs in the past that are obviously AI written and don't really make sense/apply to what I've spoken to people about previously \- I'm sure everyone else has seen examples like this as well.

That's also why I think the "which model/platform should we use?" conversation is far from the most important one to have when thinking about building an agent. Of course model choice does matter \- some are better at coding, some are better at analysis, some are cheaper.

But the bigger lever is everything wrapped around the model:

- the context it can access  
- the review workflow around it  
- the judgement of the person steering it

That is where the real value sits today.

## Why this matters for the rest of us

The reason I've spent all this time talking about coding agents is not that I want every business owner or employee to start living in a terminal window (where you typically use Claude Code) \- quite the **opposite**.

It's because they prove my wider point \- we now have one domain where:

- agents are given a goal, not a rigid script  
- context is loaded from real working materials  
- humans stay involved while the work is happening  
- expertise determines how much value you get out of the system

That is the model I think we should all follow.

It works. Not all the time \- coding agents still make bad assumptions, still get things wrong, still need checking. But that's why the regular human reviews add value. That's part of why they are such a good blueprint. They show what the setup looks like when the model is powerful and useful, but not perfect \- which large language models never will be, because of how they work.

A lot of what I am building in Praxis follows that same shape, just applied outside software. Better context, clearer review points, more visible process, more room for human judgement where it actually matters.

And I think that is the direction a lot of business tooling will go over the next few years.

Software hasn't got their first because it's special in any way. It does have advantages, like agents being able to run tests and "lint" (check code structure against a set of rules) and act on that feedback. But software also happened to be the first place where people had both the incentive and the means to work through the rough edges.

So if you want to understand where agents are heading in agencies, operations, reporting, client service, or anywhere else that mixes judgement with execution, I would spend less time looking at flashy online demo videos and more time watching how skilled people use coding agents.

That's what the next wave of business agents will look like.