Lots of people are using AI, not many people are using it well

McKinsey's 2025 State of AI report says 88% of companies are using AI in at least one business function. In pay per click advertising (PPC) specifically, 3/4 of people say they're using Generative AI tools at least some of the time. AI tool adoption in general has more than doubled in the last year.

But only 6% of organisations are seeing any real impact. Two-thirds are still stuck in "pilot mode", according to McKinsey. Half of the advertising industry doesn't have an AI roadmap yet.

Those numbers show the reality that we're all feeling - there are lots of flashy demos and people talking a big game on LinkedIn, but not much real impact on our day to day work yet. I'm working on closing that gap - and this post will explain why I think it exists.

What two years of building agents actually looks like

I built my first Google Ads "AI Agent" in November 2023, the week OpenAI released the Assistants API. That was the first time you could give a Large Language Model (LLM) access to "tools" and let it decide what to do with them. That ability was what really kicked off the agent race and a lot of the excitement and progress we've seen since then.

Since then, I've built a lot of things, both with AI involved and without. Now we have computers that "understand" language, that includes things like agents that review search term reports and draft negative keyword lists for Google search ads. Tools that fetch data from across business systems, flag anything that needs urgent attention and explain why or even what to do about it. Even systems that can generate full ad campaign structures from basic customer briefs.

Some of them have worked really well pretty quickly, and some of them needed a lot of work to get right - and they're all still improving, as we learn more about how to build this type of system and the tech itself improves rapidly as well.

Take that first agent I talked about. It made a great demo, but for any agency or advertiser to use in real life, it would need a lot more hard work and customisation to make sure the campaigns it created actually followed best practice rules for targeting the right users with the right ads.

I've been doing that hard work for a lot of different businesses over the last two years and have learned what it actually looks like to get effective Agents and AI projects working.

The pattern I keep seeing

What separates the stuff that works from the stuff that doesn't is actually rarely the AI model chosen. It's almost always the same external problems, and they're problems that simply updating to the latest OpenAI, Claude or Gemini models won't magically fix.

The most common: the model doesn’t have enough valuable context to work with. Over half of PPC professionals say unreliable or inconsistent output is the biggest limitation of AI tools (Search Engine Journal, PPC Trends 2026). "Unreliable" sounds to me like people simply don't trust the results - and stories of models hallucinating (making up) results/data are widespread. The nature of LLMs means we can't completely stop hallucinations happening, but what causes them? Often, it's because the model doesn't have a source of data to rely on. Often people ask generic questions, the models aren't plugged into the right sources or platforms, and the models make something up in an attempt to answer.

That's like asking a colleague for the sales numbers for last year, but not letting them open any spreadsheets/finance reports - they'll take a guess, but there's a good chance it won't be quite right (and rounded up or down).

Whenever I build any reporting/data-based tools, I'll always include the raw data alongside the AI Agent's response for the user to double check if they doubt anything. This really adds to the trust factor users have of the agents they're working with, and prevents incorrect data being passed around the business.

The second problem: automations aren't making the best use of LLMs. I see a lot of businesses that have built some automation workflows - using n8n, Zapier, or scripts with API calls to OpenAI or Claude built in. And for some use cases, that's all that's required (especially data pipelines and processing). The downside is that they work for exactly the scenario they were designed for, but nothing more.

What if different clients or campaigns have different goals? Add a big ugly Excel style IF function. A key platform updates an API/moves something? Rewrite the integration. Unexpected scenario you didn't anticipate? It either breaks or does the wrong thing without warning.

If you build enough of these workflows for tasks they're not fit for, you'll end up spending more time maintaining them than you'd spend just doing the work manually, or people will just stop using the tools at all.

The third pattern: finding the right level of AI permissions. People often think there's a hard choice between "the AI suggests something, I still do it manually" (which saves way less time than you'd think, and reduces humans to button pushers for the AI!) or "the AI does whatever it wants" (which in reality, no reasonable size business would allow). There's a middle ground, based on the context you're working in - and it's finding that sweet spot (plus being able to configure it in different cases), which really unlocks value.

How to solve these problems

Give the agent access to the right data sources - this is the most important piece, and the one I think about how to apply and improve the most. Your processes, your knowledge of your ideal customer, your campaign history, the lessons your team has learned. That's what moves agent output from generic to genuinely useful. There are distinct types of context that matter, and I think getting the right context to the agent at the right time is the hardest and most valuable problem to solve right now.

Build "agentic" workflows - Yes it's a buzzword, but to me it means giving more control over the exact flow of a task to an agent. There's a real and growing difference between a rigid chain of steps and an agent that is given a goal and figures out how to achieve it. They look similar from the outside but behave very differently when anything unexpected happens. And there's an added effect: as the models improve, and more importantly, it gains more context, a well-built agent improves its own processes quickly.

How humans stay involved - not as a temporary safety net until the AI gets good enough, but as part of the actual architecture. Thinking deliberately about what is actually the aim of any given process, what the agent can and can't do, at what level, and building trust over time. Every time a human reviews and corrects the output, that feedback becomes context that makes future output better (which connects straight back to the previous point).

What's coming next

I'm going to write about all of this and more over the next few weeks - How the spectrum from rigid automation to dynamic flows actually works and why "agentic" isn't just a buzzword. What context really means in practice - it's much more nuanced and interesting than just "copy/paste more text into ChatGPT". Why human oversight is a key architectural decision, not just a safety check. And how coding agents (like Claude Code & OpenAI Codex) are already using advanced patterns that we can apply in other domains.

Sources:

McKinsey State of AI 2025 - 88% adoption, 6% high performers, ~65% in pilot mode
SEJ PPC Trends 2026 - 75% of PPC pros using GenAI, 50%+ cite unreliable output
IAB State of Data 2025 - 30% full integration, half lack strategic roadmap