More information isn't always better information - Avoiding context rot in agents

Last week I wrote about the six types of information that separate a useful AI Agent from a generic one, and where each type actually lives right now.

When people think about that information, or “context” for agents, they usually think the best thing to do is find as much as possible and put it all into the prompt, but that’s not the answer.

This week I want to talk about what happens when you've done the hard work gathering all that context - you've captured your processes, your domain knowledge, your testing history - but now need to find the best way to actually give it to the agent.

Because more context doesn't automatically mean better output, and that’s where a lot of agent usage goes wrong.

The textbook problem

Think of it like this - you ask someone to answer a very specific question about how to solve a complex maths problem. But instead of giving them a concise formula sheet (like we used to get in exams), you hand them the whole 1,000-page textbook covering a massive amount of similar formulas and methods.

The answer, or how to get to it, is in there somewhere. But they've got to wade through 999 pages of stuff that isn't quite right to find the one page that is. And the more textbooks you give them, the harder it gets to find the bit that actually matters.

That's what happens when you dump all your context into a single prompt or agent session. The information is there, but the model can't find the right bit easily. It doesn't know what's important right now vs. what's just background noise or distractions.

This is actually a common challenge everyone using LLMs/agents faces - sometimes called "context rot" or the "lost in the middle" problem. Research shows that LLMs struggle to use information buried in the middle of very long inputs. They actually pay most attention to the beginning and the end. Anything in the middle - which might be the most important bit - gets lost or devalued. And the bigger the model “context window” (the amount of data an LLM can process in one go) the more pronounced this becomes.

Context windows are getting bigger. That doesn't solve this.

Every few months at the moment, a model launches with a bigger context window. Gemini and Claude can handle a million tokens. The answer to agents lacking context seems obvious - just throw everything in, the model will figure it out.

But that’s not what happens right now. A bigger context window doesn't mean the model uses everything in it equally well. It's the same as the textbook problem. You've just given them a longer book - they're still going to struggle to find and apply the important bit in the middle.

The practical limit and technical limit of how much information a model can reliably process are different. The amount of context a model can meaningfully work with is almost always smaller than the official context window size.

How to actually fix the context problem

So the answer isn't "give the agent everything." It's knowing what to load, and when.

I think about this in three tiers - based on how stable the context is and how often it's actually needed:

Foundation Context. Procedural context (how we do things) and general domain context (what we know about the client, their market, their business). Things that don’t normally change between runs. This is the base the agent needs every single time - think of it as the brief you'd give anyone in your team before they start on a new task.

Task Specific Context. Specific domain context (current ad campaign goals, this month's budget), external context (recent ad platform changes, seasonal factors), and preference context (how this client likes things reported). This changes more often and isn't always relevant to every task. So we need to only load the relevant bits of info at the right time.

Historic Context. Episodic context - the history of what's been tried, what worked, what the human corrected last time the agent ran. This is the compounding type of context I wrote about last week. But it also grows the fastest, which means it can become the biggest source of noise if you dump it all in. The answer is the same as for task specific context - the agent needs to query past experience when it hits a relevant situation, rather than carrying the entire history in every session.

This is the context architecture I used when building agents and the framework that powers custom agents in my Praxis platform.

This all seems less glamorous than choosing the right model or writing the perfect prompt. But it's where so much more of the value actually is. Two agents with the same model and the same tools given the same task input produce very different output if one is given extra context all at once and the other loads the right things only when they’re needed.

Why this matters more than the model

Everyone has access to the same base state of the art models: GPT, Claude & Gemini. You can't differentiate based on which one you use because your competitors are using the same ones.

What you can differentiate on is two things: what context you have (your expertise, your processes, your history) and how well you deliver it. The first is about information capture and documentation. The second is about architecture - and it's what separates a good agent from a good model in a bad harness.

The question this raises

So if you’ve gathered all your information and are feeding it into an agent in the right way - how do you know it's actually using what you gave it?

The honest answer is: you often can't tell just from looking at the output.

LLMs don't just get things wrong - they get things wrong confidently. An agent can pull the right data, ignore it, hallucinate a number instead, and build a perfectly plausible recommendation on top of that made up figure. The output reads well. The logic sounds reasonable. If you don’t really interrogate that output, you'd approve the recommendation - a trap I see so many people fall into.

Real “human-in-the-loop” means seeing the working - what data the agent actually pulled, what it based its reasoning on, whether you can trace the logic from input to output. Not just "is this answer right?" but "how did it get there? What did it actually use?"

You decide which context is always relevant and which is situational. You review the working, not just the output. You catch the moment an agent starts treating a hallucinated number as fact - but only if the process is visible enough to spot it.

That's what I'll get into next week: why human-in-the-loop means visibility throughout the process, not just a final approval gate. Because "we check everything before it goes out" isn't the reassurance most people think it is - not if you can't show how the answer was reached.