<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Daz</title>
    <subtitle>Software developer, creative coder, electronic musician</subtitle>
    <link rel="self" type="application/atom+xml" href="https://daz.is/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://daz.is"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-04-22T00:00:00+00:00</updated>
    <id>https://daz.is/atom.xml</id>
    <entry xml:lang="en">
        <title>Code I&#x27;ll Never Read</title>
        <published>2026-04-22T00:00:00+00:00</published>
        <updated>2026-04-22T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/code-i-will-never-read/"/>
        <id>https://daz.is/blog/code-i-will-never-read/</id>
        
        <content type="html" xml:base="https://daz.is/blog/code-i-will-never-read/">&lt;p&gt;Six months ago I&#x27;d have laughed if you told me AI was writing all my code. I&#x27;ve written about this shift before, from
&lt;a href=&quot;&#x2F;blog&#x2F;rethinking-ai&#x2F;&quot;&gt;rethinking my position on AI&lt;&#x2F;a&gt; to building a
&lt;a href=&quot;&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;&quot;&gt;process&lt;&#x2F;a&gt; around plans, deviation logs, and targeted review. But there&#x27;s a
further step I wasn&#x27;t expecting to come so soon.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve started to question whether I&#x27;m even qualified to judge the code any more, because the code isn&#x27;t written for me.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-moment-it-clicked&quot;&gt;The moment it clicked&lt;&#x2F;h2&gt;
&lt;p&gt;I have a side project I&#x27;ve worked on for years. No deadlines, no clients. Just code written for the pleasure of writing
it. Some of the best code I&#x27;ve ever produced, by my own standards. I&#x27;ve always had a few side projects on the go as
somewhere to develop the craft without compromise. The quality of this code mattered to me, and I thought it was good.&lt;&#x2F;p&gt;
&lt;p&gt;I showed it to Claude Code and asked &quot;how easy would this codebase be for you to work with?&quot; I told it not to hold back.
Review it purely from an agentic coding perspective, ignore human aesthetics entirely.&lt;&#x2F;p&gt;
&lt;p&gt;The feedback wasn&#x27;t great.&lt;&#x2F;p&gt;
&lt;p&gt;One thing it flagged: I&#x27;d replaced the router with a custom macro that let me define routes inline with their handlers.
Elegant. Everything in one place. Open one file, see the route and its logic together. For me this was better than what
it replaced, a nested router definition scattered across different files.&lt;&#x2F;p&gt;
&lt;p&gt;Claude&#x27;s take: that pattern removed the router as a navigable index. An agent uses the router to get an overview of
available endpoints and to target changes precisely. My clever colocation made the codebase harder for an agent to
reason about.&lt;&#x2F;p&gt;
&lt;p&gt;You could argue the router would have the same benefit for a human not familiar with the project, but the benefits for
me as the main code maintainer outweighed the cost. It was certainly an improvement over what I had before. It removed a
lot of boilerplate, and the router was only one part of it.&lt;&#x2F;p&gt;
&lt;p&gt;I realised that in making the code better for me, I&#x27;d removed something that the agent expected to be there. And the
clever metaprogramming that made things better for a human reader had made it more difficult for the AI agent.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s a small thing. But it opened a bigger question.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;down-the-rabbit-hole&quot;&gt;Down the rabbit hole&lt;&#x2F;h2&gt;
&lt;p&gt;I found a paper from April 2026,
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;arxiv.org&#x2F;html&#x2F;2604.07502&quot;&gt;&quot;Beyond Human-Readable: Rethinking Software Engineering Conventions for the Agentic Development Era&quot;&lt;&#x2F;a&gt;,
that was asking the same question. Their core argument: many practices we treat as anti-patterns may actually be virtues
when agents are the primary consumers of the code. They even proposed a &quot;program skeleton&quot;, a navigable high-level index
of the codebase, which is essentially what my custom macro had removed.&lt;&#x2F;p&gt;
&lt;p&gt;That prompted me to look more carefully at other codebases I work with, running the same experiment. The router wasn&#x27;t a
one-off. Some similar patterns kept showing up across different projects.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;who-is-the-code-for&quot;&gt;Who is the code for?&lt;&#x2F;h2&gt;
&lt;p&gt;Developers are starting to talk about what happens when we stop reading code. I&#x27;ve
&lt;a href=&quot;&#x2F;blog&#x2F;stop-reading-the-code&#x2F;&quot;&gt;written about this&lt;&#x2F;a&gt; myself: the scaling problem, the cognitive limits (400 lines an hour,
60 minutes before quality drops off a cliff), the increase in code volume that AI agents produce.&lt;&#x2F;p&gt;
&lt;p&gt;Most of that conversation focuses on how we maintain quality if we can&#x27;t read everything.&lt;&#x2F;p&gt;
&lt;p&gt;And there&#x27;s a compounding problem. Even when humans do review agent code, the quality of that review degrades. AI output
follows similar patterns, and reviewers start to skim rather than properly
analyse. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;asyncsquadlabs.com&#x2F;blog&#x2F;code-review-bottleneck-ai-era&#x2F;&quot;&gt;Template blindness&lt;&#x2F;a&gt;
is when the code all looks plausible, so subtle bugs slip through. So not only is human review failing to scale, it&#x27;s
getting less reliable on the code it does cover.&lt;&#x2F;p&gt;
&lt;p&gt;What if the things we&#x27;ve always valued in code aren&#x27;t what matter any more?&lt;&#x2F;p&gt;
&lt;p&gt;We have decades of received wisdom about what makes code good. Clean abstractions. DRY. Colocation of related concerns.
Patterns that make the codebase a pleasure to navigate, if you&#x27;re a human holding the whole thing in your head.&lt;&#x2F;p&gt;
&lt;p&gt;But the code isn&#x27;t primarily for humans any more. If agents are writing it and agents are working with it, then &quot;good
code&quot; means something different.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-human-taste-and-agent-needs-diverge&quot;&gt;Where human taste and agent needs diverge&lt;&#x2F;h2&gt;
&lt;p&gt;I found a few common patterns from my research where things I&#x27;d instinctively do as a human developer worked against how
agents navigate and reason about code.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Boilerplate as signal.&lt;&#x2F;strong&gt; Humans like removing boilerplate. But that boilerplate is often the structural signal an
agent relies on to orient itself. A standard router definition is repetitive to read, but it&#x27;s instantly parseable. My
custom macro removed that repetition and, with it, the navigability.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Metaprogramming vs. common patterns.&lt;&#x2F;strong&gt; A custom DSL or macro is a delight once you learn it. Agents are trained on
millions of examples of conventional code.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Implicit conventions vs. explicit structure.&lt;&#x2F;strong&gt; Relying on things &quot;you just know&quot; doesn&#x27;t work for an agent that has to
rebuild its context every turn.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;DRY vs. tolerable duplication.&lt;&#x2F;strong&gt; Humans instinctively factor out repetition. But for agents, a repeated function
self-contained in each file is easier to reason about than a shared abstraction they have to trace across the codebase.
The indirection costs more than the duplication.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Global elegance vs. local self-explanation.&lt;&#x2F;strong&gt; Agents don&#x27;t hold the whole codebase in their head. They reward code
that makes sense locally, file by file.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-are-humans-actually-reviewing-for&quot;&gt;What are humans actually reviewing for?&lt;&#x2F;h2&gt;
&lt;p&gt;I still review, and sometimes it pays off.&lt;&#x2F;p&gt;
&lt;p&gt;I had the agent working on a Rust project recently. &lt;code&gt;sqlx&lt;&#x2F;code&gt; gives you compile-time SQL checking: the compiler validates
your queries against the actual database schema. The agent hit a build error because the database hadn&#x27;t had migrations
applied. Rather than fix the migration issue, it quietly switched from the compile-time macros to the runtime query
functions. Build passed. But I&#x27;d lost a compile-time guarantee, replaced with a runtime check that would only fail in
production.&lt;&#x2F;p&gt;
&lt;p&gt;I always instruct my agent to keep a deviation log, and luckily, that caught it. I didn&#x27;t have to review every line
because the agent flagged where it diverged from the plan and why. That&#x27;s the kind of thing humans should be catching.
That&#x27;s a different job from &quot;does this code look clean to me.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Looking back, that was a context failure. The agent&#x27;s priority was to finish the task, and it did, by trading a
compile-time guarantee for a runtime one. The fix here isn&#x27;t more human review. It&#x27;s making the constraints explicit in
the context.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-uncomfortable-conclusion&quot;&gt;The uncomfortable conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;A year ago I wouldn&#x27;t have believed this. But, as I&#x27;m becoming more comfortable with the idea that I won&#x27;t be reading
all the code, I&#x27;m now making the uncomfortable assertion that perhaps humans aren&#x27;t qualified to review agent code
anyway.&lt;&#x2F;p&gt;
&lt;p&gt;If agents are the primary consumers of the codebase, then optimising for human readability might actively make things
worse. Some of what we&#x27;d flag in code review (repetition instead of abstraction, explicit over implicit)
might be exactly what an agent needs. And some of what we&#x27;d praise (elegant metaprogramming, terse abstractions,
patterns that feel clever) might be actively hostile to agent workflows.&lt;&#x2F;p&gt;
&lt;p&gt;Quality hasn&#x27;t gone away. It just means something different now. Quality is about whether the code is correct,
verifiable, and productive for whoever is working with it. Right now, that&#x27;s increasingly not us.&lt;&#x2F;p&gt;
&lt;aside class=&quot;aside-callout&quot;&gt;
  &lt;span class=&quot;aside-callout__label&quot;&gt;Aside&lt;&#x2F;span&gt;
  &lt;p&gt;There&#x27;s a gap here I don&#x27;t have a clean answer for. Agents have local coherence but not global coherence. They make
changes that work within the files they can see, but they don&#x27;t hold the shape of the whole system across sessions. In
theory, that&#x27;s still a human job. But if I&#x27;m arguing the code isn&#x27;t for humans, I can&#x27;t also argue humans should be
reading it for structural consistency. And if the instincts we&#x27;d bring to that review push us toward patterns that work
against the agent, we might be making things worse. My best guess at the moment is that the things that actually
matter (broken integrations, contradictory assumptions across modules) should be caught by deterministic checks, not
eyeballs.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;h2 id=&quot;what-next&quot;&gt;What next?&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m still working this out. I still don&#x27;t feel comfortable not reading the code, so I am reviewing the code, but I know
this will happen less and less as models and agents improve. But when I review now, I&#x27;m trying to catch myself before
applying my outdated human opinions about what good code looks like.&lt;&#x2F;p&gt;
&lt;p&gt;We need to let go of what &quot;good code&quot; used to mean to us.&lt;&#x2F;p&gt;
&lt;p&gt;What should we be doing?&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;better specifications: making constraints explicit in the context so the agent doesn&#x27;t have to guess your priorities&lt;&#x2F;li&gt;
&lt;li&gt;stronger verification: type systems, compile-time checks, integration tests. If you&#x27;re not reading every line, you
need deterministic checks&lt;&#x2F;li&gt;
&lt;li&gt;production-like testing: staging environments that stress the system under realistic conditions. Deterministic checks
prove correctness in isolation, but you need to know the whole thing holds together before it ships&lt;&#x2F;li&gt;
&lt;li&gt;observability: instrumentation to catch unexpected behaviour&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;We don&#x27;t actually know yet what makes code optimally readable for agents. Anything we do spot will be a property of
today&#x27;s models and tooling, not a deep truth.&lt;&#x2F;p&gt;
&lt;p&gt;The craft hasn&#x27;t gone yet. It&#x27;s moving from the aesthetics of the code to the quality of the specification and the
strength of the verification. The hardest part isn&#x27;t adopting new tools. It&#x27;s unlearning the instincts that made you
good at the old ones.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Why AI Fails at Scale</title>
        <published>2026-03-25T00:00:00+00:00</published>
        <updated>2026-03-25T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/why-ai-fails-at-scale/"/>
        <id>https://daz.is/blog/why-ai-fails-at-scale/</id>
        
        <content type="html" xml:base="https://daz.is/blog/why-ai-fails-at-scale/">&lt;p&gt;Last week I read
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.theregister.com&#x2F;2026&#x2F;03&#x2F;17&#x2F;ai_businesses_faking_it_reckoning_coming_codestrap&#x2F;&quot;&gt;AI still doesn&#x27;t work very well, businesses are faking it, and a reckoning is coming&lt;&#x2F;a&gt;
in The Register. The picture it paints is bleak: enterprise AI is mostly failing, the metrics are gamed, and the bill is
coming due. I&#x27;ve been trying to reconcile that with my own experience, because my experience has been the opposite. AI
coding agents have been a massive performance gain for me. They&#x27;ve made me a better developer.&lt;&#x2F;p&gt;
&lt;p&gt;In December 2025, Claude Code and Opus 4.5 crossed a threshold. The models got good enough that agents actually worked
well. I started experimenting in December, and by January I&#x27;d gone deep. That&#x27;s when I realised my old way of working
was dead. I had to mourn my craft of 25 years while adapting.&lt;&#x2F;p&gt;
&lt;p&gt;It hasn&#x27;t been smooth. I&#x27;ve had failures. Outputs I threw away, approaches that turned into dead ends, entire days where
the AI confidently produced something that looked right but wasn&#x27;t. The difference is that I&#x27;ve learned to treat those
failures as signal. I feed them back into how I work. And that&#x27;s made all the difference.&lt;&#x2F;p&gt;
&lt;p&gt;The Register article isn&#x27;t wrong. The numbers are bad. But my experience says something different is possible. I&#x27;ve
been trying to make sense of that gap.&lt;&#x2F;p&gt;
&lt;p&gt;Two ideas. The first is a way to map the problem space: a simple quadrant that clarifies what kind of problem you&#x27;re
actually trying to solve with AI. The second is the outer loop: the feedback practice where you keep improving &lt;em&gt;how&lt;&#x2F;em&gt;
you work with AI over time, not just what the AI outputs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-numbers-if-you-want-them&quot;&gt;The numbers, if you want them&lt;&#x2F;h2&gt;
&lt;p&gt;The scale of failure is well-documented.
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;fortune.com&#x2F;2025&#x2F;08&#x2F;18&#x2F;mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo&#x2F;&quot;&gt;MIT&#x27;s &lt;em&gt;GenAI
Divide&lt;&#x2F;em&gt; report&lt;&#x2F;a&gt;
found 95% of enterprise AI pilots delivered no measurable P&amp;amp;L impact. A
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.pertamapartners.com&#x2F;insights&#x2F;ai-project-failure-statistics-2026&quot;&gt;broader analysis&lt;&#x2F;a&gt; across 2,400+
initiatives put the figure at over 80% of an estimated $684 billion in 2025 AI investment failing to deliver intended
value. 42% of companies
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.theregister.com&#x2F;2026&#x2F;03&#x2F;17&#x2F;ai_businesses_faking_it_reckoning_coming_codestrap&#x2F;&quot;&gt;scrapped most of their AI initiatives&lt;&#x2F;a&gt;
in 2025, up from 17% the year before.&lt;&#x2F;p&gt;
&lt;p&gt;And yet workers
at &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;fortune.com&#x2F;2025&#x2F;08&#x2F;19&#x2F;shadow-ai-economy-mit-study-genai-divide-llm-chatbots&#x2F;&quot;&gt;90% of companies surveyed&lt;&#x2F;a&gt;
report using personal AI tools daily, often outperforming the corporate tools their employers are spending millions on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-quadrant-mapping-the-problem-space&quot;&gt;The quadrant: mapping the problem space&lt;&#x2F;h2&gt;
&lt;p&gt;When I hear about people getting mixed results with AI, the first thing I wonder is: what kind of problem were they
trying to solve? Because not all problems are the same, and I think a lot of failure comes from not being clear about
this upfront.&lt;&#x2F;p&gt;
&lt;p&gt;Two dimensions matter:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Data: structured vs. unstructured.&lt;&#x2F;strong&gt; Is the input clean, tabular, and well-defined? Or is it messy, ambiguous, and
human-generated?&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Process: deterministic vs. non-deterministic.&lt;&#x2F;strong&gt; Does the workflow demand the same outcome every time? Or does it
require judgement, interpretation, and tolerance for variation?&lt;&#x2F;p&gt;
&lt;p&gt;This gives you four zones:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;why-ai-fails-at-scale&#x2F;.&#x2F;quadrant.png&quot; alt=&quot;automation quadrants&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;structured-data-deterministic-process-just-automate-it&quot;&gt;Structured data + deterministic process: just automate it&lt;&#x2F;h3&gt;
&lt;p&gt;Data integrations. Asset delivery. Metadata pipelines. Compliance reporting. I&#x27;ve built a lot of these over the years:
structured inputs, defined schemas, the correct answer is the same every time. Most enterprise integration work lives
here. You&#x27;re mapping fields, transforming formats, moving data between systems. Traditional automation handles this
well. AI introduces risk for no gain. You don&#x27;t want a probabilistic answer to a field mapping.
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;zapier.com&#x2F;blog&#x2F;deterministic-ai&#x2F;&quot;&gt;Forrester found&lt;&#x2F;a&gt; gen AI still orchestrates less than 1% of core business
processes. Conventional automation still runs most of this work, and it should.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;structured-data-non-deterministic-process-ai-s-sweet-spot-today&quot;&gt;Structured data + non-deterministic process: AI&#x27;s sweet spot today&lt;&#x2F;h3&gt;
&lt;p&gt;Matching people and experience to proposals. Cleaning and standardising company data. Reporting and analytics. Content
recommendation. Risk scoring. Scheduling. The data is solid but the judgement about what to do with it varies. AI can
find patterns humans miss, and structured data gives it something reliable to work with. MIT found the biggest ROI in
exactly this zone: back-office automation, cutting outsourcing costs, streamlining operations. Vendor-built tools in
this
space &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;fortune.com&#x2F;2025&#x2F;08&#x2F;18&#x2F;mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo&#x2F;&quot;&gt;succeed about 67% of the time&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;unstructured-data-deterministic-process-interpret-then-execute&quot;&gt;Unstructured data + deterministic process: interpret, then execute&lt;&#x2F;h3&gt;
&lt;p&gt;Email triage. Document classification. Compliance screening. Contract review. The input is messy but the downstream
workflow is rule-based. AI handles the interpretation; deterministic logic handles what happens next. This is the hybrid
pattern: AI reads and classifies, then rules enforce the outcome. It works well when you get the boundary right.
Salesforce has been &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.salesforce.com&#x2F;blog&#x2F;deterministic-ai&#x2F;&quot;&gt;shifting toward exactly this architecture&lt;&#x2F;a&gt; in
Agentforce, combining LLM flexibility with rule-based execution.&lt;&#x2F;p&gt;
&lt;p&gt;This is also classic ML territory. Supervised learning was purpose-built for this: take unstructured input, classify it
into a structured category, hand off to a deterministic system. Spam detection, sentiment analysis, fraud scoring, image
recognition. A fine-tuned BERT model will often do this faster, cheaper, and more reliably than a generative model. Not
every AI problem needs an LLM. The hype has pulled organisations toward frontier tools for problems that a classifier
would handle better, and that mismatch accounts for a lot of the failure.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;unstructured-data-non-deterministic-process-the-frontier&quot;&gt;Unstructured data + non-deterministic process: the frontier&lt;&#x2F;h3&gt;
&lt;p&gt;Coding. Strategy. Creative work. Research. Novel problem-solving. Both the input and the process are open-ended. This is
where individual power users report the biggest gains and where enterprise failure rates are highest.&lt;&#x2F;p&gt;
&lt;p&gt;Andrej Karpathy describes the challenge here as the
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;venturebeat.com&#x2F;technology&#x2F;karpathys-march-of-nines-shows-why-90-ai-reliability-isnt-even-close-to&quot;&gt;March of Nines
&lt;&#x2F;a&gt;. The maths
is simple but brutal. Imagine a 10-step agentic workflow where each step succeeds 90% of the time. That sounds decent.
But 0.9^10 is 0.35. Your end-to-end success rate is 35%. That&#x27;s your demo. It looks impressive when it works, and it
fails quietly most of the time.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;why-ai-fails-at-scale&#x2F;.&#x2F;march-of-nines.png&quot; alt=&quot;march of nines&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Getting from 90% to 99% per step is one order of magnitude of effort. Getting from 99% to 99.9% is another, roughly
equal in difficulty. Each additional nine costs about as much engineering as the last one. At 99% per step, your 10-step
workflow lands at 90% end-to-end. At 99.9%, you&#x27;re at 99%. Production-grade reliability means marching through those
nines, and each one takes real engineering to reach.&lt;&#x2F;p&gt;
&lt;p&gt;Prompting and agent skills get you to 90%. They&#x27;re necessary but not sufficient. The remaining nines come from &lt;strong&gt;harness
engineering&lt;&#x2F;strong&gt;: putting AI systems on deterministic rails. Validation at each step. State management so you can resume or
retry. Programmatic control over what the model can and can&#x27;t do. Structured outputs. Assertions on intermediate
results. The kind of engineering that isn&#x27;t exciting but makes the difference between a demo and a system you can trust.&lt;&#x2F;p&gt;
&lt;p&gt;This is why the frontier quadrant has the highest failure rate. The compounding error problem is intrinsic to multi-step
AI workflows, and most organisations stop at the demo. They see 90% per step and call it good enough. They don&#x27;t invest
in the harness engineering that turns a promising prototype into something reliable. And when it fails in production,
they blame the model.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-frontier-is-moving&quot;&gt;The frontier is moving&lt;&#x2F;h3&gt;
&lt;p&gt;The difference between success and failure in this quadrant isn&#x27;t the model, it&#x27;s the engineering around it. And I
believe this quadrant is going to keep expanding. Everything that can be codified the way code can be codified is
vulnerable to the same kind of disruption. But only if people learn to work with the AI effectively, which most haven&#x27;t
yet.&lt;&#x2F;p&gt;
&lt;p&gt;This is the frontier, and it&#x27;s moving quickly: better models, better tooling around them, and a growing understanding of
how to actually work with them.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-outer-loop-getting-better-at-getting-better&quot;&gt;The outer loop: getting better at getting better&lt;&#x2F;h2&gt;
&lt;p&gt;Most talk about AI in practice focuses on the &lt;strong&gt;inner loop&lt;&#x2F;strong&gt;: the agent loop. The model receives a prompt, reasons,
takes actions, gets feedback, iterates. This is the cycle inside the tool. It&#x27;s what the AI does.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;strong&gt;outer loop&lt;&#x2F;strong&gt; is what &lt;em&gt;you&lt;&#x2F;em&gt; do. It&#x27;s the feedback practice where you evaluate whether your way of working with AI
is actually producing good outcomes, and then adapt your process based on what you learn.&lt;&#x2F;p&gt;
&lt;aside class=&quot;aside-callout&quot;&gt;
  &lt;span class=&quot;aside-callout__label&quot;&gt;Aside&lt;&#x2F;span&gt;
  &lt;p&gt;A note on the term: inner&#x2F;outer loop shows up in several places and means different things depending on
who&#x27;s talking. In DevOps it&#x27;s the local dev cycle vs. CI&#x2F;CD. Jeff Huber uses it for
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;jeffhuber.substack.com&#x2F;p&#x2F;the-rise-of-context-engineering&quot;&gt;context engineering&lt;&#x2F;a&gt;: the inner loop assembles
context for this generation step, the outer loop improves your context pipeline over time. Gene Kim and Steve Yegge
propose a
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;itrevolution.com&#x2F;articles&#x2F;the-three-developer-loops-a-new-framework-for-ai-assisted-coding&#x2F;&quot;&gt;three-loop model&lt;&#x2F;a&gt;
in their &lt;em&gt;Vibe Coding&lt;&#x2F;em&gt; book, where the outer loop is strategic architecture. Kief Morris at Thoughtworks writes about
humans being &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;martinfowler.com&#x2F;articles&#x2F;exploring-gen-ai&#x2F;humans-and-agents.html&quot;&gt;&quot;on the loop&quot;&lt;&#x2F;a&gt;, maintaining the
harness rather than supervising every output. I&#x27;m using the term at a different level from all of these. Not the system
getting better at context, or the organisation managing architecture, but the &lt;em&gt;practitioner&lt;&#x2F;em&gt; getting better at working
with AI through deliberate reflection on outcomes.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;p&gt;I use AI, it sometimes fails. I examine why, and I adjust my process: how I structure the task, what I verify, where I
intervene, what I delegate. Over time, this compounds. I&#x27;m not just using a tool; I&#x27;m developing a practice.&lt;&#x2F;p&gt;
&lt;p&gt;The model and the harness improve together. The way I work with AI adapts as the model changes. Some of my process will
probably be redundant when models improve further. Maybe. I&#x27;m not sure which parts yet, and that uncertainty is exactly
why the outer loop matters. Without it, you&#x27;re either clinging to a process that has become overhead, or you&#x27;re
abandoning discipline still earning its keep. You can&#x27;t tell which is which unless you&#x27;re paying attention to outcomes.&lt;&#x2F;p&gt;
&lt;p&gt;As
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.theregister.com&#x2F;2026&#x2F;03&#x2F;17&#x2F;ai_businesses_faking_it_reckoning_coming_codestrap&#x2F;&quot;&gt;the Codestrap founders put it&lt;&#x2F;a&gt;,
companies are measuring lines of code and pull requests (activity metrics) instead of deployment frequency, change
failure rate, and mean time to restore (outcome metrics). Without measuring outcomes, there&#x27;s no signal to feed back
into process improvement. The inner loop runs, produces outputs, and nobody asks whether the whole system is actually
working.&lt;&#x2F;p&gt;
&lt;p&gt;Rich Sutton&#x27;s
&lt;a rel=&quot;external&quot; href=&quot;http:&#x2F;&#x2F;www.incompleteideas.net&#x2F;IncIdeas&#x2F;BitterLesson.html&quot;&gt;&lt;em&gt;The Bitter Lesson&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;
is relevant here: over 70 years, general methods that leverage computation have consistently beaten
elaborate human-designed scaffolding. As models improve, some of the scaffolding you built becomes overhead. But it&#x27;s
not a simple story. Prompt injection isn&#x27;t solved. Hallucination isn&#x27;t solved. Deterministic verification matters
where you can get it. Scaffolding that constrains the problem space or validates outputs isn&#x27;t fighting the model,
it&#x27;s good engineering. The outer loop is the practice of continuously reassessing what&#x27;s earning its keep. Scaffolding
isn&#x27;t categorically good or bad.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;putting-it-together&quot;&gt;Putting it together&lt;&#x2F;h2&gt;
&lt;p&gt;Before deploying AI, understand the nature of the problem. Is the data structured or messy? Is the process rule-based or
judgement-based?&lt;&#x2F;p&gt;
&lt;p&gt;The research points to several reasons AI initiatives fail: poor data quality, lack of clear success metrics, losing
executive sponsorship, treating AI as an IT project rather than a business transformation. These all matter. But one
pattern that the quadrant helps explain is tool-problem mismatch:
deploying frontier AI into problems that need traditional automation, or throwing chatbots at work that demands
carefully engineered hybrid systems. It&#x27;s not the only cause of failure, but it&#x27;s one that clearer thinking upfront can
prevent.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Match the harness to the zone.&lt;&#x2F;strong&gt; Each quadrant needs a different approach. Deterministic problems need deterministic
tools. Hybrid problems need a clear boundary between what the AI interprets and what rules enforce. Frontier problems
need harness engineering that earns each nine of reliability, with human judgement actually in the loop.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Run the outer loop.&lt;&#x2F;strong&gt; Whatever quadrant you&#x27;re in, build a feedback mechanism that evaluates outcomes, not activity.
Are you actually shipping better code, making better decisions, producing better analysis? If you&#x27;re not measuring this,
you don&#x27;t know whether AI is helping or generating expensive noise.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Hold the tensions.&lt;&#x2F;strong&gt; The Bitter Lesson says don&#x27;t over-engineer scaffolding the model will outgrow. But deterministic
verification genuinely matters where you can get it. Prompt injection and hallucination are unsolved. Models are getting
better fast, but &quot;better&quot; doesn&#x27;t mean &quot;trustworthy in all contexts.&quot; The outer loop is how you navigate these tensions,
by continuously reassessing what&#x27;s working rather than picking a side and staying there.&lt;&#x2F;p&gt;
&lt;p&gt;The 95% failure rate isn&#x27;t a verdict on AI. It&#x27;s a verdict on how organisations are thinking about it, or not thinking
about it clearly enough. The people succeeding aren&#x27;t using better models. They&#x27;re thinking more clearly about where AI
fits, and they&#x27;re learning and iterating toward getting the best outcomes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;fortune.com&#x2F;2025&#x2F;08&#x2F;18&#x2F;mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo&#x2F;&quot;&gt;MIT NANDA, &lt;em&gt;GenAI
Divide&lt;&#x2F;em&gt;&lt;&#x2F;a&gt; (2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.pertamapartners.com&#x2F;insights&#x2F;ai-project-failure-statistics-2026&quot;&gt;Pertama Partners, AI project failure statistics&lt;&#x2F;a&gt; (
2026)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.theregister.com&#x2F;2026&#x2F;03&#x2F;17&#x2F;ai_businesses_faking_it_reckoning_coming_codestrap&#x2F;&quot;&gt;The Register &#x2F; Codestrap on enterprise AI failures&lt;&#x2F;a&gt; (
2026)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;http:&#x2F;&#x2F;www.incompleteideas.net&#x2F;IncIdeas&#x2F;BitterLesson.html&quot;&gt;Rich Sutton, &lt;em&gt;The Bitter Lesson&lt;&#x2F;em&gt;&lt;&#x2F;a&gt; (2019)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;venturebeat.com&#x2F;technology&#x2F;karpathys-march-of-nines-shows-why-90-ai-reliability-isnt-even-close-to&quot;&gt;Andrej Karpathy, The March of Nines&lt;&#x2F;a&gt; (
2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.oneusefulthing.org&#x2F;p&#x2F;the-bitter-lesson-versus-the-garbage&quot;&gt;Ethan Mollick, &lt;em&gt;The Bitter Lesson vs. the
Garbage&lt;&#x2F;em&gt;&lt;&#x2F;a&gt; (2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;servicepath.co&#x2F;2025&#x2F;09&#x2F;ai-integration-crisis-enterprise-hybrid-ai&#x2F;&quot;&gt;ServicePath, deterministic guardrails for AI&lt;&#x2F;a&gt; (
2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;zapier.com&#x2F;blog&#x2F;deterministic-ai&#x2F;&quot;&gt;Zapier, hybrid deterministic + non-deterministic architectures&lt;&#x2F;a&gt; (2026)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.salesforce.com&#x2F;blog&#x2F;deterministic-ai&#x2F;&quot;&gt;Salesforce, rule-based execution in Agentforce&lt;&#x2F;a&gt; (2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.cio.com&#x2F;article&#x2F;4114010&#x2F;2026-the-year-ai-roi-gets-real.html&quot;&gt;CIO.com, enterprise AI ROI pressure&lt;&#x2F;a&gt; (2026)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Five Rewrites in a Week</title>
        <published>2026-03-18T00:00:00+00:00</published>
        <updated>2026-03-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/five-rewrites-in-a-week/"/>
        <id>https://daz.is/blog/five-rewrites-in-a-week/</id>
        
        <content type="html" xml:base="https://daz.is/blog/five-rewrites-in-a-week/">&lt;p&gt;I recently built an internal data tool my team needed. Six months ago it wouldn&#x27;t have been viable. It would have just
been manual work. Nobody would have dedicated engineering time to automate this task before. But with Claude Code and
Opus 4.6, I built a production-ready tool in days that replaced all of that manual work.&lt;&#x2F;p&gt;
&lt;p&gt;AI changed the economics enough to make it worth doing. Is it worth the engineering investment for a tool that won&#x27;t be
needed long term? Before AI, the answer was no. The tool would not have got built. The work would have stayed manual
with some SQL and Excel.&lt;&#x2F;p&gt;
&lt;p&gt;Anish Acharya at a16z coined the term &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;a16z.com&#x2F;disposable-software&#x2F;&quot;&gt;&quot;disposable software&quot;&lt;&#x2F;a&gt; to describe how
software creation used to be constrained by ROI, but is now constrained by imagination. His examples are mostly consumer
and personal. The enterprise version of this argument is more consequential: internal tools that cross the ROI threshold
because development (feasibility) cost is now much reduced.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-first-prototype&quot;&gt;The first prototype&lt;&#x2F;h2&gt;
&lt;p&gt;It started as a simple Python script. Load data into memory, process, and output the results. This is where the tool
would have stopped in the old world. A script on someone&#x27;s laptop. Maybe a Notion page explaining how to run it. Good
enough. Move on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;fast-iteration&quot;&gt;Fast iteration&lt;&#x2F;h2&gt;
&lt;p&gt;Each step below is a decision gate where someone would traditionally ask &quot;is this worth the effort?&quot; None of this would
have been worth the effort without AI, but with AI it meant I could quickly iterate through multiple prototypes and not
be scared about throwing away code.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;might-as-well-put-a-ui-on-this&quot;&gt;&quot;Might as well put a UI on this&quot;&lt;&#x2F;h3&gt;
&lt;p&gt;Move to Rust. I&#x27;m an experienced Rust developer, so a natural move for me. But great that I had Claude Code on hand to
quickly translate the code from one language (Python) to another (Rust). Rust is the right tool for a memory-sensitive
data processing task. The core logic was well understood from the Python prototype. Rewriting it was cheap. The app now
serves up a simple HTML form.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;this-needs-to-scale&quot;&gt;&quot;This needs to scale&quot;&lt;&#x2F;h3&gt;
&lt;p&gt;The first version loaded everything into memory. That won&#x27;t work with real data volumes and the memory constraints of a
containerised environment. I needed a streaming diff algorithm. I had a vague idea of how that should work. I didn&#x27;t
have to spend too long on working out the details because as I started explaining it to Claude Code, it could work out
how to fill the gaps. I&#x27;m directing, but not &lt;a href=&quot;&#x2F;blog&#x2F;ai-engineer&#x2F;&quot;&gt;vibe coding&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;we-need-to-bulk-run-this&quot;&gt;&quot;We need to bulk run this&quot;&lt;&#x2F;h3&gt;
&lt;p&gt;Bolt on a CLI interface for batch operations. Straightforward addition but another thing that wouldn&#x27;t have been worth
the effort if I was implementing it manually. This will potentially save a lot more manual work. Who knows? It might not
get used, but it was a simple addition to the spec.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;this-needs-to-run-in-production&quot;&gt;&quot;This needs to run in production&quot;&lt;&#x2F;h3&gt;
&lt;p&gt;There&#x27;s a lot that needs adding to turn a simple tool into a production ready tool. Container config, observability,
workers, cloud infrastructure, IAM roles, secrets management. The boilerplate of getting something actually running.&lt;&#x2F;p&gt;
&lt;p&gt;This is where I&#x27;ve used another AI coding pattern called &lt;strong&gt;&quot;style transfer&quot;&lt;&#x2F;strong&gt;. Point Claude Code at an existing
production service and say &quot;make it like that.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Infrastructure config encodes institutional knowledge. How your organisation organises the configurations needed, what
your deployment conventions are, how you handle secrets. AI pattern-matches against existing services without you having
to write a docs page or copy-paste configs manually. You get something that follows your org&#x27;s conventions because it
learned them from a working example.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-the-research-is-showing&quot;&gt;What the research is showing&lt;&#x2F;h2&gt;
&lt;p&gt;Anthropic&#x27;s &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.anthropic.com&#x2F;research&#x2F;agentic-coding-trends&quot;&gt;2026 Agentic Coding report&lt;&#x2F;a&gt; describes tasks that
required weeks of cross-team coordination becoming focused working sessions. MIT Technology
Review &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.technologyreview.com&#x2F;2025&#x2F;12&#x2F;16&#x2F;1108441&#x2F;ai-coding-is-now-everywhere-which-means-you-need-to-know-what-it-can-and-cant-do&#x2F;&quot;&gt;reported&lt;&#x2F;a&gt;
on developers surrendering control over individual lines and focusing on overall architecture.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;rust-as-an-ai-augmented-development-language&quot;&gt;Rust as an AI-augmented development language&lt;&#x2F;h2&gt;
&lt;p&gt;The Rust compiler is essentially a second reviewer for AI-generated code. When Claude Code writes Rust, it gets
immediate, precise, actionable feedback from &lt;code&gt;cargo check&lt;&#x2F;code&gt;. Memory safety, lifetime issues, ownership violations, all
caught at compile time, not in production.&lt;&#x2F;p&gt;
&lt;p&gt;The tight feedback loop matters enormously for AI agents. The compiler doesn&#x27;t just say &quot;error.&quot; It says what&#x27;s wrong,
where, and often how to fix it. That&#x27;s ideal for an agentic coding tool iterating in a loop.&lt;&#x2F;p&gt;
&lt;p&gt;AI plus Rust&#x27;s compiler creates a verification pipeline that lets you trust AI-generated code faster. I wrote about the
importance of &lt;a href=&quot;&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;&quot;&gt;verification in my process&lt;&#x2F;a&gt;
previously. The compiler is an automated verification step that runs on every iteration. And the compiler error messages
with Rust are brilliant and help AI coding agents track down and fix issues much faster.&lt;&#x2F;p&gt;
&lt;p&gt;Language choice for AI-augmented development should optimise for the strength of the automated verification feedback
loop.&lt;&#x2F;p&gt;
&lt;p&gt;Others are arriving at the same conclusion independently. Adam Benenson argues
in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.adambenenson.com&#x2F;blog&#x2F;the-compiler-is-the-harness&quot;&gt;&quot;The Compiler Is the Harness&quot;&lt;&#x2F;a&gt; that Rust&#x27;s strictness
is what makes it &lt;em&gt;easy&lt;&#x2F;em&gt; for AI agents. Agentic coding lives or dies on feedback loops. If code compiles, it has already
satisfied a whole class of nontrivial constraints. Mykhailo Chalyi makes a complementary point
in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;misha-chalyi.github.io&#x2F;posts&#x2F;rust-winning-ai-code-gen&#x2F;&quot;&gt;&quot;Rust Is Winning the AI Code Generation Race&quot;&lt;&#x2F;a&gt;: the
writability problem of Rust disappears with AI agents, while the readability, type safety, and performance advantages
remain.&lt;&#x2F;p&gt;
&lt;p&gt;Anthropic&#x27;s own C compiler project is telling here too. Sixteen parallel Claude agents producing 100 thousand lines of
Rust. The choice of Rust was deliberate. The type system and ownership model serve as natural guardrails, and
test-driven development with tight feedback loops was the critical enabler.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-shifting-economics&quot;&gt;The shifting economics&lt;&#x2F;h2&gt;
&lt;p&gt;The tool I made is temporary and won&#x27;t be needed for long. In the pre-AI world, it would have never been built. But it&#x27;s
saved a lot of manual and error-prone work that would have been needed otherwise.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s not that existing software is cheaper to build. That&#x27;s true but not the interesting part. New categories of
short-lived, purpose-built internal tools become viable. Things your team needs but nobody would dedicate engineering
time to. Data migration utilities, reconciliation jobs, debugging aids, one-off reporting tools.&lt;&#x2F;p&gt;
&lt;p&gt;I couldn&#x27;t have justified building the final product upfront. I had to discover the requirements iteratively. AI made
each iteration cheap enough to keep going.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;The compiler is your safety net.&lt;&#x2F;strong&gt; When choosing a stack for AI-augmented work, optimise for the quality of the
automated verification loop. Rust&#x27;s borrow checker and type system aren&#x27;t friction. They&#x27;re a trust accelerator for
AI-generated code.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Style transfer for infrastructure.&lt;&#x2F;strong&gt; You already have production services that encode your org&#x27;s patterns. Use them as
templates. Point the AI at a working example and say &quot;match that.&quot; This is where AI saves the most tedious, error-prone
time.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Iterative discovery over upfront planning.&lt;&#x2F;strong&gt; AI makes it cheap to explore the design space. Start with a script. See
if it&#x27;s useful. Add a UI. Hit scaling limits. Redesign. Each pivot is cheap. You learn what you actually need by
building.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The ROI threshold has moved.&lt;&#x2F;strong&gt; Recalibrate what&#x27;s &quot;worth building.&quot; Short-lived internal tools, data migration
utilities, debugging aids. Things your team needs but nobody would dedicate engineering time to. These are now viable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-s-worth-building-now&quot;&gt;What&#x27;s worth building now?&lt;&#x2F;h2&gt;
&lt;p&gt;The interesting question isn&#x27;t &quot;how fast can AI write code.&quot; It&#x27;s &quot;what becomes worth building when the cost drops this
much?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;I think we&#x27;re still in the early days of answering that. The threshold has moved more than most people realise.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Advanced Tool Calling Patterns for AI Agents</title>
        <published>2026-03-13T00:00:00+00:00</published>
        <updated>2026-03-13T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/advanced-tool-calling-patterns/"/>
        <id>https://daz.is/blog/advanced-tool-calling-patterns/</id>
        
        <content type="html" xml:base="https://daz.is/blog/advanced-tool-calling-patterns/">&lt;p&gt;I&#x27;ve already &lt;a href=&quot;&#x2F;blog&#x2F;context-engineering-is-the-job&#x2F;&quot;&gt;written about context engineering&lt;&#x2F;a&gt; as the core discipline of
building AI systems. I&#x27;ve been experimenting with my own AI tools for coding, research, and automation. I&#x27;m noticing
that tool calling starts to consume more and more context, and so we need strategies to scale tool calling.&lt;&#x2F;p&gt;
&lt;p&gt;My stack is Rust-based, using Rig for LLM abstraction, Restate for durable execution, Postgres, and a hypermedia
architecture with Maud and HTMX. It works well. But as I&#x27;ve added more tools and connected more MCP servers, context
usage is creeping up.&lt;&#x2F;p&gt;
&lt;p&gt;Every tool definition (name, description, JSON schema) eats tokens before the conversation even starts. A modest setup
with a few MCP servers can consume 50,000+ tokens just on tool schemas.&lt;&#x2F;p&gt;
&lt;p&gt;Also, each tool call is a full inference round-trip. The model calls a tool, waits for the result, processes it, calls
the next one. A workflow that touches five tools means five round-trips, plus all the intermediate reasoning. It&#x27;s slow
and eats up tokens.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tool-search-load-what-you-need-when-you-need-it&quot;&gt;Tool search: load what you need, when you need it&lt;&#x2F;h2&gt;
&lt;p&gt;Instead of stuffing every tool definition into the context upfront, you load only a small set of frequently used tools
plus a special tool search tool. Everything else is deferred. When the agent needs a capability it doesn&#x27;t have, it
searches for it, gets back lightweight summaries, and then the full schema of the selected tool gets loaded for the rest
of the conversation.&lt;&#x2F;p&gt;
&lt;p&gt;Anthropic&#x27;s research shows an 85% reduction in context usage, and accuracy on tool selection improved from 49% to 74% on
Opus 4. On Opus 4.5 it went from 79.5% to 88.1%.&lt;&#x2F;p&gt;
&lt;p&gt;Anthropic offer a server-side implementation where you mark tools with &lt;code&gt;defer_loading: true&lt;&#x2F;code&gt; in the API request and they
handle the search internally. But the more interesting version, for my purposes, is client-side. You build a tool
registry that indexes tool names and descriptions, expose a &lt;code&gt;tool_search&lt;&#x2F;code&gt; tool that returns lightweight summaries, and
on selection inject the full schema into context. This is model-agnostic. It&#x27;s just a tool that returns tool
definitions.&lt;&#x2F;p&gt;
&lt;p&gt;It turns out &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rig.rs&#x2F;&quot;&gt;Rig&lt;&#x2F;a&gt;, the Rust LLM framework I&#x27;m already using, has a version of this built in.
Rig&#x27;s &quot;RAG-enabled tools&quot; let you implement a &lt;code&gt;ToolEmbedding&lt;&#x2F;code&gt; trait on your tools, store them in a vector store, and
retrieve the most relevant ones at query time using &lt;code&gt;.dynamic_tools(n, vector_store_index, toolset)&lt;&#x2F;code&gt;. It&#x27;s the
client-side tool search pattern, using embedding-based semantic retrieval rather than keyword matching. The mechanism is
the same as document RAG, applied to tool definitions instead of documents. I hadn&#x27;t realised the utility of this
before, but the infrastructure for tool search is already in my stack.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ll probably take a hybrid approach by keeping a few core tools always loaded and deferring everything else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;programmatic-tool-calling-let-the-llm-write-code&quot;&gt;Programmatic tool calling: let the LLM write code&lt;&#x2F;h2&gt;
&lt;p&gt;Instead of calling tools one at a time through the standard tool-calling protocol, the LLM writes code that orchestrates
multiple tool calls, processes results with proper programming constructs (loops, conditionals, aggregation), and
returns only the final output. The code runs in a sandbox with no direct network access. Tool calls inside the generated
code go through a bridge back to the host application, which handles authentication and routing.&lt;&#x2F;p&gt;
&lt;p&gt;This approach can achieve higher accuracy with much lower token usage. Anthropic reports average token usage dropping
from 43,588 to 27,297 (a 37% reduction) on complex research tasks, and accuracy improvements on GIA benchmarks from
46.5% to 51.2%. A third-party test by The AI Automators backed this up: a budget compliance check across 20 team members
took 56 tool calls and 76,000 tokens with traditional calling and still missed a result. The same task with programmatic
calling took 4 to 12 tool calls, used fewer tokens, and got all results correct.&lt;&#x2F;p&gt;
&lt;p&gt;Cloudflare has two takes on this. Their original Code Mode converts MCP tool schemas into TypeScript type definitions and
runs generated code in V8 isolates. Their newer Code Mode MCP server takes it further, working against Cloudflare&#x27;s
OpenAPI spec rather than MCP schemas. The model writes JavaScript to call &lt;code&gt;search()&lt;&#x2F;code&gt; and &lt;code&gt;execute()&lt;&#x2F;code&gt;, exposing the
entire Cloudflare API through just two tools and consuming around 1,000 tokens regardless of how many API endpoints sit
behind it. When I first saw this approach, I joked it was RCE-as-a-Service, but it actually looks
quite cool if you can get the sandboxing and permissions worked out.&lt;&#x2F;p&gt;
&lt;p&gt;For my Rust stack, the sandbox question is still open. Pydantic&#x27;s Monty is appealing because it&#x27;s a Rust-based Python
interpreter that boots in single-digit microseconds. But it only supports a subset of Python. I&#x27;m also curious about
what could be achieved with something like Rhai, a pure Rust embeddable scripting language. There&#x27;s a lot to think about
and get right here including sandboxing, expressiveness, how well LLMs can actually generate code for the target
language, security, and performance.&lt;&#x2F;p&gt;
&lt;p&gt;I still think for recurring, well-defined tasks, it&#x27;s better to use pre-written scripts (a &quot;skills&quot; system) rather than
having the LLM generate code every time. Programmatic tool calling is most valuable for novel, ad-hoc queries where the
specific combination of tools and logic can&#x27;t be predicted in advance. I want to experiment with this, but I don&#x27;t have
a specific use case for this right now.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tool-use-examples-few-shot-prompting-for-tools&quot;&gt;Tool use examples: few-shot prompting for tools&lt;&#x2F;h2&gt;
&lt;p&gt;The third pattern is simpler. JSON schemas define structure but can&#x27;t express usage patterns. Tool use examples provide
concrete input&#x2F;output demonstrations that show the LLM exactly how to call a tool correctly.&lt;&#x2F;p&gt;
&lt;p&gt;Anthropic&#x27;s testing showed parameter accuracy improved from 72% to 90% with examples. The best practices are to add one
to five examples per tool, use realistic data, show variety in how the tool can be called, and focus on cases where
correct usage isn&#x27;t obvious from the schema alone.&lt;&#x2F;p&gt;
&lt;p&gt;Tool search and tool use examples aren&#x27;t compatible in Anthropic&#x27;s current API. If you need examples for a specific
tool, that tool needs to stay in standard (non-deferred) mode. A skills-based approach can serve a similar purpose,
though. When the agent loads a skill file, it gets instructions and example invocations as part of the context,
achieving the same effect through context engineering rather than a separate API feature.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-m-building-next&quot;&gt;What I&#x27;m building next&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m going to try the client-side tool registry with search. This is low-effort, high-impact, and it works with any
model. Second, I want to try adding sandboxed code execution once I&#x27;ve figured out the right sandbox approach for a Rust
host.&lt;&#x2F;p&gt;
&lt;p&gt;I also still think the skills-based approach offers the best value. This means using skill descriptions and providing a
CLI or scripts to access additional capabilities. The Skill + CLI combination is hard to beat because it&#x27;s powerful and
understandable.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ll write more as I build this out. If you&#x27;re working on similar problems, or if you&#x27;ve already implemented any of
these patterns, I&#x27;d love to hear what you&#x27;ve found. &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;&lt;strong&gt;Sources&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;anthropic.com&#x2F;engineering&#x2F;advanced-tool-use&quot;&gt;Advanced Tool Use&lt;&#x2F;a&gt; (Anthropic)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;anthropic.com&#x2F;engineering&#x2F;code-execution-with-mcp&quot;&gt;Code Execution with MCP&lt;&#x2F;a&gt; (Anthropic)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;anthropic.com&#x2F;engineering&#x2F;effective-context-engineering-for-ai-agents&quot;&gt;Effective Context Engineering for AI Agents&lt;&#x2F;a&gt;
(Anthropic)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;platform.claude.com&#x2F;docs&#x2F;en&#x2F;agents-and-tools&#x2F;tool-use&#x2F;tool-search-tool&quot;&gt;Tool Search Docs&lt;&#x2F;a&gt; (Anthropic)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;blog.cloudflare.com&#x2F;code-mode&quot;&gt;Code Mode&lt;&#x2F;a&gt; (Cloudflare)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;blog.cloudflare.com&#x2F;code-mode-mcp&quot;&gt;Code Mode MCP&lt;&#x2F;a&gt; (Cloudflare)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pydantic&#x2F;monty&quot;&gt;Pydantic Monty&lt;&#x2F;a&gt; (Pydantic)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;manus.im&#x2F;blog&#x2F;Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus&quot;&gt;Context Engineering for AI Agents&lt;&#x2F;a&gt;
(Manus)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>What Happens When You Stop Reading the Code?</title>
        <published>2026-03-11T00:00:00+00:00</published>
        <updated>2026-03-11T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/stop-reading-the-code/"/>
        <id>https://daz.is/blog/stop-reading-the-code/</id>
        
        <content type="html" xml:base="https://daz.is/blog/stop-reading-the-code/">&lt;p&gt;I recently wrote about &lt;a href=&quot;&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;&quot;&gt;how I work with AI coding agents&lt;&#x2F;a&gt; and
about &lt;a href=&quot;&#x2F;blog&#x2F;code-review-ai-augmented-development&#x2F;&quot;&gt;code review in AI-augmented development&lt;&#x2F;a&gt;. I meant every word of
both. But parts of them are already not quite where my thinking is now.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a retraction. The ground keeps moving under our feet. The only irresponsible position right now is
certainty. We have to be open to changing our minds as the AI models and harnesses improve, and as we discover how best
to work with this technology.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-four-steps&quot;&gt;The four steps&lt;&#x2F;h2&gt;
&lt;p&gt;Dan Shapiro recently &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.danshapiro.com&#x2F;blog&#x2F;2026&#x2F;02&#x2F;you-dont-write-the-code&#x2F;&quot;&gt;wrote about&lt;&#x2F;a&gt; what StrongDM&#x27;s CTO
Justin McCarthy learned building a software factory. The progression is simple:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Recognise you&#x27;re not the best person to write the code any more. The AI writes the code.&lt;&#x2F;li&gt;
&lt;li&gt;Accept that if you&#x27;re not writing the code, but you&#x27;re still reading every line, you are the bottleneck. Stop reading
the code too.&lt;&#x2F;li&gt;
&lt;li&gt;Recognise that this creates an enormous pile of terrifying problems.&lt;&#x2F;li&gt;
&lt;li&gt;Realise that solving those problems is now your actual job.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;I think this describes the trajectory we&#x27;re on. Shapiro describes a destination. What I&#x27;m trying to describe is being
mid-journey, somewhere on this path. But exaclty where I am depends entirely on context.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-i-actually-am&quot;&gt;Where I actually am&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Side projects:&lt;&#x2F;strong&gt; I&#x27;m experimenting freely. Steps 1 and 2 feel natural. I let the AI generate, I don&#x27;t read every line,
and I&#x27;m building verification instead. I&#x27;m focusing on carefully reviewing the plans, and developing AI assisted code
review. The cost of failure is low. The learning is high.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;At work:&lt;&#x2F;strong&gt; I&#x27;m closer to traditional review. SOC 2, ISO 27001, compliance requirements mean I need evidence that a
human understood what shipped. &quot;An AI agent healed it&quot; is not an answer our compliance team can work with yet. Nor
should it be. I&#x27;m thinking about how AI can help scale this, but I&#x27;m working in a team, and so other factors need to be
taken into account.&lt;&#x2F;p&gt;
&lt;p&gt;I can see the destination Shapiro describes. I&#x27;m not fully there yet. And that&#x27;s fine. The interesting question isn&#x27;t
&quot;have you arrived?&quot; but &quot;what has to be true before you can move further along the path?&quot;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-letting-go-is-less-scary-than-it-sounds&quot;&gt;Why letting go is less scary than it sounds&lt;&#x2F;h2&gt;
&lt;p&gt;Human code review was never very good at finding bugs. The empirical evidence backs this up.&lt;&#x2F;p&gt;
&lt;p&gt;What&#x27;s more interesting is what code review actually delivered as side effects: shared understanding of the codebase,
consistency across the team, accountability for what shipped, knowledge transfer between engineers. Those are real and
valuable.&lt;&#x2F;p&gt;
&lt;p&gt;But they&#x27;re not what most engineers think they&#x27;re defending when they resist the idea of not reading every line. When
you realise you&#x27;re grieving familiarity and shared understanding rather than bug-catching capability, it reframes the
problem. Those are solvable problems. They just have different solutions than line-by-line review.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;from-reviewer-to-feedback-loop-designer&quot;&gt;From reviewer to feedback loop designer&lt;&#x2F;h2&gt;
&lt;p&gt;If you&#x27;re not writing the code, and you&#x27;re not reading every line, what is your job?&lt;&#x2F;p&gt;
&lt;p&gt;Not: &quot;Did the AI write good code?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;But: &quot;Have I built an environment where bad code can&#x27;t survive?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;This is closer to SRE thinking than traditional code review. You&#x27;re designing systems that keep AI-generated output on
track: verification pipelines, observability, feedback loops, automated gates. The discipline doesn&#x27;t disappear when you
let go of reading every line. It moves. From inspecting output to designing the systems that inspect output for you.&lt;&#x2F;p&gt;
&lt;p&gt;I wrote about &lt;a href=&quot;&#x2F;blog&#x2F;mechanical-sympathy&#x2F;&quot;&gt;mechanical sympathy&lt;&#x2F;a&gt; recently, the idea that every generation of engineers
needs to understand the layer beneath their abstraction. The same principle applies here. You need to understand how
AI-generated code fails (quietly, confidently, locally-coherent-but-globally-inconsistent) to design feedback loops that
catch those specific failure modes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;verifiable-over-deterministic&quot;&gt;Verifiable over deterministic&lt;&#x2F;h2&gt;
&lt;p&gt;My earlier thinking drew a hard line: use deterministic tools (linters, type checkers, compilers, tests) for everything
you can, and only use AI for the rest. I still believe that. But it&#x27;s incomplete. The real requirement isn&#x27;t
determinism. It&#x27;s &lt;strong&gt;verifiability&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s a spectrum:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Best: verifiable and deterministic.&lt;&#x2F;strong&gt; Linters, type systems, compilers, test suites. Same input, same output. You can
prove correctness. This is the gold standard and you should push as much as possible into this category.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Useful: verifiable but non-deterministic.&lt;&#x2F;strong&gt; AI code review that flags concerns with evidence. Human review.
Property-based testing with AI-generated cases. The process isn&#x27;t repeatable, but you can assess whether the output is
right. You can show your working.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Dangerous: unverifiable and non-deterministic.&lt;&#x2F;strong&gt; Trusting AI output with no mechanism to assess correctness. No tests,
no review, no evidence trail. This is where things go wrong, and it&#x27;s where most &quot;vibe coding&quot; sits when done
carelessly.&lt;&#x2F;p&gt;
&lt;p&gt;The question isn&#x27;t &quot;is this check deterministic?&quot; It&#x27;s &quot;can I verify the result, and can I show evidence of that
verification?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;This is also where compliance frameworks might eventually meet AI-augmented workflows. The intent of SOC 2 and ISO 27001
isn&#x27;t &quot;a human read every line.&quot; It&#x27;s &quot;you can demonstrate control and correctness.&quot; Auditable, evidenced verification
could satisfy that intent even as the mechanism shifts. Not today, necessarily. But that&#x27;s the direction.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-needs-to-be-true&quot;&gt;What needs to be true&lt;&#x2F;h2&gt;
&lt;p&gt;Before organisations can move further along Shapiro&#x27;s four steps, several things need to happen.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Verification tooling needs to mature.&lt;&#x2F;strong&gt; Not just linters and tests, but AI-assisted review that produces auditable
evidence. We need tools that don&#x27;t just say &quot;this looks fine&quot; but show why, with traces that an auditor could follow.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Compliance frameworks need to catch up.&lt;&#x2F;strong&gt; Or at least be interpreted in ways that recognise systematic verification as
a valid control. The current assumption in most audit frameworks is that a human reviewed the change. That assumption
will need to evolve, but it won&#x27;t evolve until the alternative demonstrably works.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The specification layer needs proper tooling.&lt;&#x2F;strong&gt; If intent documents and specs become the durable artefact (and I think
they will), they need consistency checking, dead requirement detection, contradiction detection. Right now, a repo full
of markdown specs is just files. No compiler tells you when two specs contradict each other. No linter catches a
requirement that&#x27;s been superseded but never removed.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Teams need new ways to maintain shared understanding.&lt;&#x2F;strong&gt; Code review served a knowledge-sharing function that had
nothing to do with finding bugs. If that goes away, something else needs to replace it. AI-generated explanations of
what changed and why, targeted at humans rather than machines, might serve that purpose. But the tooling isn&#x27;t there
yet.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Trust needs to be built incrementally.&lt;&#x2F;strong&gt; Side projects first. Low-stakes features. Gradually expanding the boundary as
confidence in verification systems grows. This is how every new practice earns legitimacy in engineering organisations,
and AI-augmented workflows shouldn&#x27;t be an exception.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;this-post-has-a-shelf-life-too&quot;&gt;This post has a shelf life too&lt;&#x2F;h2&gt;
&lt;p&gt;My previous posts described how I work and how I think about code review. This one describes how both of those are
shifting and why.&lt;&#x2F;p&gt;
&lt;p&gt;I expect to write another one when the ground moves again. It will.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s not a failure of thinking. It&#x27;s the appropriate response to a situation that is genuinely shifting under us. The
only irresponsible position right now is certainty.&lt;&#x2F;p&gt;
&lt;p&gt;The discipline is the same as it&#x27;s always been in engineering: understand the layer beneath the one you&#x27;re working at.
The layer has changed. &lt;a href=&quot;&#x2F;blog&#x2F;mechanical-sympathy&#x2F;&quot;&gt;The discipline hasn&#x27;t&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;If you&#x27;re on this path too, wherever you are on it, I&#x27;d love to hear where you&#x27;ve landed. &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Operational Debt</title>
        <published>2026-03-04T00:00:00+00:00</published>
        <updated>2026-03-04T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/operational-debt/"/>
        <id>https://daz.is/blog/operational-debt/</id>
        
        <content type="html" xml:base="https://daz.is/blog/operational-debt/">&lt;p&gt;Years of running production systems give you something that&#x27;s not in the code. You learn the real-world usage patterns,
the failures that only show up under load, the degradation behaviour that creeps in over months. You learn which alerts
actually matter and which are noise.&lt;&#x2F;p&gt;
&lt;p&gt;That knowledge is earned incrementally. Through building, observing, failing, and iterating. It lives in people, not in
repositories.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve been thinking about what happens to that knowledge when code generation speeds up by an order of magnitude.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;cognitive-debt-briefly&quot;&gt;Cognitive debt, briefly&lt;&#x2F;h2&gt;
&lt;p&gt;The term &lt;strong&gt;cognitive debt&lt;&#x2F;strong&gt; was brought into software engineering
by &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;margaretstorey.com&#x2F;blog&#x2F;2026&#x2F;02&#x2F;09&#x2F;cognitive-debt&#x2F;&quot;&gt;Margaret-Anne Storey&lt;&#x2F;a&gt; earlier this year: the gap between
what AI-generated code does and how well the developers actually understand it.
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;martinfowler.com&#x2F;fragments&#x2F;2026-02-13.html&quot;&gt;Martin Fowler&lt;&#x2F;a&gt;
and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;simonwillison.net&#x2F;2026&#x2F;Feb&#x2F;15&#x2F;cognitive-debt&#x2F;&quot;&gt;Simon Willison&lt;&#x2F;a&gt; have since amplified it, and it&#x27;s gained
serious traction. Five independent research groups converged on the same finding in a single week: AI agents generate
code 5-7x faster than developers can comprehend it.&lt;&#x2F;p&gt;
&lt;p&gt;Storey followed up with a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;margaretstorey.com&#x2F;blog&#x2F;2026&#x2F;02&#x2F;18&#x2F;cognitive-debt-revisited&#x2F;&quot;&gt;second post&lt;&#x2F;a&gt; exploring
the implications further. Anthropic&#x27;s own research showed AI coding assistance reduces developer skill mastery by 17%.
Developers who delegated code generation scored below 40% on comprehension tests.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve written about this before from the &lt;a href=&quot;&#x2F;blog&#x2F;code-review-ai-augmented-development&#x2F;&quot;&gt;code review angle&lt;&#x2F;a&gt;
and &lt;a href=&quot;&#x2F;blog&#x2F;build-fast-learn-slow&quot;&gt;build fast learn slow&lt;&#x2F;a&gt;, but there&#x27;s a piece I haven&#x27;t named until now.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;operational-debt&quot;&gt;Operational debt&lt;&#x2F;h2&gt;
&lt;p&gt;Here&#x27;s what I want to put a name to:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Operational debt&lt;&#x2F;strong&gt; is code generated faster than teams can earn the &lt;em&gt;operational knowledge&lt;&#x2F;em&gt; to run it reliably in
production.&lt;&#x2F;p&gt;
&lt;p&gt;Cognitive debt is about understanding what the code does. Operational debt is about understanding what happens when it
runs. They&#x27;re related but distinct.&lt;&#x2F;p&gt;
&lt;p&gt;Operational knowledge is a specific thing. It&#x27;s knowing the real-world usage patterns of your system. It&#x27;s knowing which
metrics actually correlate with user pain and which are just noise. It&#x27;s understanding how the system degrades under
pressure, not the clean failure modes you designed for, but the messy ones that emerge over time.&lt;&#x2F;p&gt;
&lt;p&gt;This knowledge grows through lived experience with a running system. You can&#x27;t generate it. You can&#x27;t shortcut it. You
earn it by operating the system with real users doing unpredictable things, over months and years. Speed up code
generation 5-7x and this knowledge doesn&#x27;t keep pace. It can&#x27;t.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-pattern-that-s-hard-to-ignore&quot;&gt;The pattern that&#x27;s hard to ignore&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;operational-debt&#x2F;.&#x2F;status-image.png&quot; alt=&quot;Screenshot of fictional service status bar&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Check the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;status.anthropic.com&quot;&gt;Claude status page&lt;&#x2F;a&gt;.
Check &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.githubstatus.com&quot;&gt;GitHub&#x27;s recent reliability track record&lt;&#x2F;a&gt;. Both companies leaning heavily into
AI-generated code. Both struggling with operational reliability.&lt;&#x2F;p&gt;
&lt;p&gt;I know, correlation isn&#x27;t causation. There are many possible explanations: rapid growth, scaling challenges,
organisational complexity. But the pattern is there. The companies most aggressively adopting AI for their own codebases
are also the ones with the most visible reliability issues. It&#x27;s worth asking why.&lt;&#x2F;p&gt;
&lt;p&gt;My hypothesis is that when you generate code faster than your team can build operational understanding of it, your
ability to run that code reliably degrades. Not because the code is bad. Because nobody has had time to learn how it
behaves in the wild.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-they-compound&quot;&gt;How they compound&lt;&#x2F;h2&gt;
&lt;p&gt;These problems compound:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cognitive debt&lt;&#x2F;strong&gt;: you can&#x27;t understand the code fast enough&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Review bottleneck&lt;&#x2F;strong&gt;: you can&#x27;t &lt;a href=&quot;&#x2F;blog&#x2F;code-review-ai-augmented-development&#x2F;&quot;&gt;review it&lt;&#x2F;a&gt; fast enough to maintain
quality gates&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Operational debt&lt;&#x2F;strong&gt;: you can&#x27;t earn production knowledge fast enough to run it reliably&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Now put them together. Code you don&#x27;t fully understand, that wasn&#x27;t thoroughly reviewed, running in production
environments you haven&#x27;t had time to learn the operational characteristics of.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s not a hypothetical. That&#x27;s a reliability crisis happening right now.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-to-do-about-it&quot;&gt;What to do about it&lt;&#x2F;h2&gt;
&lt;p&gt;I don&#x27;t have a fully worked-out answer. But I think it starts with recognising that &quot;how fast can we generate code&quot; is
the wrong metric. The right question is: &quot;how well do we understand what we&#x27;re running?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;The productivity gains are real. But productivity measured only in code output is measuring the wrong thing. I&#x27;m not
saying we should slow down, but we shouldn&#x27;t focus just on the speed of generation if our operational knowledge can&#x27;t
keep up.&lt;&#x2F;p&gt;
&lt;p&gt;Some things I think help:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Match generation speed to learning speed.&lt;&#x2F;strong&gt;  Give teams time to build operational understanding before the next wave
of changes lands. Easier said than done when you need to keep up with the new pace of software development.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Invest in observability before you invest in generation.&lt;&#x2F;strong&gt; If you can&#x27;t see how your system behaves, generating more
code just makes the blind spot bigger.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Treat operational knowledge as a first-class asset.&lt;&#x2F;strong&gt; Document failure modes as you discover them. Run postmortems
that capture institutional knowledge, not just action items. Make sure ops understand what changed recently.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Be honest about the gap.&lt;&#x2F;strong&gt; If your team has generated more system than it can operate, that&#x27;s a risk. Name it.
Factor it into planning.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The goal isn&#x27;t to slow down. It&#x27;s to make sure understanding keeps up with generation. Augmented development is powerful
precisely because it lets experienced practitioners move faster. But we need to keep up with the &lt;em&gt;experience&lt;&#x2F;em&gt;, not just
move &lt;em&gt;faster&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This is thinking in progress. If you&#x27;re seeing this pattern in your own teams, or if you think I&#x27;m wrong about the
connection, I&#x27;d genuinely like to hear about it. &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>How I Work with AI Coding Agents</title>
        <published>2026-03-01T00:00:00+00:00</published>
        <updated>2026-03-01T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/how-i-work-with-ai-coding-agents/"/>
        <id>https://daz.is/blog/how-i-work-with-ai-coding-agents/</id>
        
        <content type="html" xml:base="https://daz.is/blog/how-i-work-with-ai-coding-agents/">&lt;p&gt;I&#x27;ve been building software for over 25 years and I&#x27;ve been through many changes to how we work in that time. There was
Git, CI&#x2F;CD, cloud, containers to name a few. This is the biggest change in the shortest space of time. In December 2025,
Claude Code and Opus 4.5 crossed a threshold.&lt;&#x2F;p&gt;
&lt;p&gt;Since then I&#x27;ve been focusing all my energy on this: working with AI coding agents in production every day,
experimenting, noticing what works, feeding that back into my process. The approaches out there range from spec-driven
development to fully autonomous vibe coding. What follows is where I&#x27;ve landed, built from daily use on real projects.
It keeps changing.&lt;&#x2F;p&gt;
&lt;p&gt;This isn&#x27;t science. It&#x27;s field reporting from a practitioner going deep on this every day, testing what works under real
conditions. Some of what I&#x27;ve found aligns with what others in the field are discovering independently. Some, I think,
goes further.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-core-principle&quot;&gt;The core principle&lt;&#x2F;h2&gt;
&lt;p&gt;One observation underpins everything: &lt;strong&gt;LLMs are stateless&lt;&#x2F;strong&gt;. They have no memory between requests. Output quality is
bounded by context quality.&lt;&#x2F;p&gt;
&lt;p&gt;Better models don&#x27;t fix bad context. They produce more confident, more fluent slop. In
the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;survey.stackoverflow.co&#x2F;2025&#x2F;&quot;&gt;2025 Stack Overflow Developer Survey&lt;&#x2F;a&gt;, trust in AI accuracy dropped from 40%
to 29% year-over-year, even as 84% of developers kept using the tools. People keep using them because they&#x27;re genuinely
useful. The output quality is the problem to solve.&lt;&#x2F;p&gt;
&lt;p&gt;The difference between shipping quality and drowning in rework comes down to how deliberately you manage what goes into
the context window and how rigorously you verify what comes out.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-toolkit-not-a-pipeline&quot;&gt;A toolkit, not a pipeline&lt;&#x2F;h2&gt;
&lt;p&gt;My process is not a linear pipeline. It&#x27;s a toolkit of distinct steps that I assemble into a custom flow for each piece
of work. The shape depends on the outcome I&#x27;m after. Each step runs in a fresh context window, with the output
compressed into a focused artefact for the next. Context goes down at each stage while specificity goes up.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve tried to formalise this into a deterministic workflow, a controlled set of steps I can repeat. I might be
converging on something: a custom orchestration and review tool built to maximise human leverage at the points where it
matters most. But that&#x27;s early days. For now, the value is in keeping it flexible, experimenting with how the pieces fit
together, and adjusting as I learn what actually holds up under daily use.&lt;&#x2F;p&gt;
&lt;p&gt;The available steps and a typical flow:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;.&#x2F;workflow.png&quot; alt=&quot;typical workflow&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Ideation, design, and research feed each other iteratively, sometimes in parallel. Requirements crystallise from that
exploration. Then the three review stages, validate, evaluate, verify, each check different things at different points.&lt;&#x2F;p&gt;
&lt;p&gt;These steps aren&#x27;t always all present. For a small bug fix, several collapse into one session. For a substantial
feature, each is a distinct conversation with its own context window. I assemble the flow to fit the work, not the other
way around.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;research&quot;&gt;Research&lt;&#x2F;h2&gt;
&lt;p&gt;This is the highest-leverage phase. The goal is to map the problem space: relevant files, functions, data flows,
constraints, prior decisions. Not to write code.&lt;&#x2F;p&gt;
&lt;p&gt;Sub-agents do the noisy work in isolated context windows: file exploration, code search, dependency tracing. Their
compressed summaries come back to the main context clean, without the search noise that would pollute it. This is
encapsulation, applied to attention rather than code.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern is &lt;strong&gt;gather, then glean&lt;&#x2F;strong&gt;. Cast a wide net first (maximise recall), then cull to the minimal set that
matters (maximise precision). The most dangerous information isn&#x27;t the obviously irrelevant stuff. It&#x27;s information that
&lt;em&gt;looks&lt;&#x2F;em&gt; relevant but isn&#x27;t. A hallucinated assumption about how the auth system works isn&#x27;t a code-level error. It&#x27;s a
research-level error. Everything built on top of it will be wrong.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;plan&quot;&gt;Plan&lt;&#x2F;h2&gt;
&lt;p&gt;An execution blueprint. Every step numbered, sequential, unambiguous. Include test criteria and code snippets where they
remove ambiguity. The target: a plan so specific that implementation becomes almost mechanical.&lt;&#x2F;p&gt;
&lt;p&gt;This is where small mistakes get expensive fast. A bad step in a plan produces hundreds of wrong lines. I&#x27;ve had a
single missed detail in a plan generate a cascade of not-quite-right code across multiple files, all internally
consistent, all confidently wrong. The earlier you apply human judgement, the cheaper the correction.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;implement&quot;&gt;Implement&lt;&#x2F;h2&gt;
&lt;p&gt;Should be the simplest phase. Feed the plan and only the specific files needed. For larger tasks, break implementation
into chunks, each in a fresh context window, to stay below roughly 40% context window utilisation, where I&#x27;ve found
output quality starts to drop off noticeably.&lt;&#x2F;p&gt;
&lt;p&gt;In practice, this means I can run plan steps through an implementation loop: feed a step, execute, commit, fresh
context, next step. This is close to what people are calling a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ghuntley.com&#x2F;loop&#x2F;&quot;&gt;Ralph loop&lt;&#x2F;a&gt; (Geoffrey
Huntley&#x27;s pattern of running an agent repeatedly with git as the memory layer), but structured around a plan rather than
re-running the same prompt until it converges.&lt;&#x2F;p&gt;
&lt;p&gt;What I add on top is a &lt;strong&gt;deviation log&lt;&#x2F;strong&gt;. During implementation, any point where the AI diverges from the plan gets
flagged with a reason. I review and annotate these. This turns code review from reading every line to targeted
investigation of the places where plan and reality didn&#x27;t match.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;three-review-points-not-one&quot;&gt;Three review points, not one&lt;&#x2F;h2&gt;
&lt;p&gt;Most workflows put review at the end. I apply human judgement at three distinct points, each in a fresh context to avoid
bias from the previous phase.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Validate&lt;&#x2F;strong&gt; (after requirements): Are we solving the right problem? Are the requirements correct, complete, and
feasible? This catches scope errors before any planning or code exists.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Evaluate&lt;&#x2F;strong&gt; (after plan): Is the approach sound? Is the work broken into chunks that fit within the AI&#x27;s context sweet
spot? Does each chunk specify the context it needs? A plan that looks right but is poorly chunked for execution will
produce inconsistent output.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Verify&lt;&#x2F;strong&gt; (after implementation): Does the output match the plan and requirements? This is where all forms of review
converge:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Static analysis first.&lt;&#x2F;strong&gt; Types, linters, automated tests, security scanners. I write Rust, and the compiler&#x27;s error
messages are detailed enough that the agent can interpret and fix them directly. Never send an LLM or a human to do a
linter&#x27;s job.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Architecture second.&lt;&#x2F;strong&gt; Check structural decisions: dependencies, patterns, interfaces, how the new code fits the
existing system.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;AI-specific failure modes last.&lt;&#x2F;strong&gt; AI-generated code tends to have local coherence (each module works in isolation)
but poor global coherence (three modules solving overlapping problems differently, abstractions that don&#x27;t compose,
naming drift). Security is where the failures get dangerous. AI won&#x27;t add CSRF protection, rate limiting, or input
validation unless specifically prompted. It builds what you ask for, not what you need.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The research is clear: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;veracode.com&#x2F;blog&#x2F;ai-generated-code-security-risks&#x2F;&quot;&gt;45% of AI-generated code&lt;&#x2F;a&gt; contains
security
vulnerabilities. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.coderabbit.ai&#x2F;blog&#x2F;state-of-ai-vs-human-code-generation-report&quot;&gt;AI pull requests average 1.7x more issues&lt;&#x2F;a&gt;
than human PRs. If you only verify at the end, you&#x27;re trying to catch all of that in code review. Validate and evaluate
earlier, and many of those issues never get generated in the first place.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;context-management&quot;&gt;Context management&lt;&#x2F;h2&gt;
&lt;p&gt;I think about context quality across four dimensions, a framing from Dex
Horthy&#x27;s &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rmvDxxNubIg&quot;&gt;&quot;No Vibes Allowed&quot;&lt;&#x2F;a&gt; talk that I&#x27;ve found genuinely useful:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Correctness&lt;&#x2F;strong&gt;: Is everything in context accurate?&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Completeness&lt;&#x2F;strong&gt;: Is anything important missing?&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Size&lt;&#x2F;strong&gt;: All signal, minimal noise. Keep the model in its smart zone.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Trajectory&lt;&#x2F;strong&gt;: Does the conversation flow help the model reason well?&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;the-roughly-40-guideline&quot;&gt;The roughly 40% guideline&lt;&#x2F;h3&gt;
&lt;p&gt;The 40% figure comes from Dex Horthy. My experience confirms it: best performance is below 50% utilisation, and quality
drops noticeably beyond that. Chroma&#x27;s &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.trychroma.com&#x2F;context-rot&quot;&gt;context rot research&lt;&#x2F;a&gt; confirms the
underlying principle: model performance decreases as input length grows, even on simple tasks. More context usually
means worse output, not better. The practical rule: if you&#x27;re approaching the limit, start a fresh context or delegate
to a sub-agent.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;summarise-and-delegate&quot;&gt;Summarise and delegate&lt;&#x2F;h3&gt;
&lt;p&gt;Two strategies for keeping context under control.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Summarise&lt;&#x2F;strong&gt; is reactive. Compact accumulated context between phases. The output of research becomes a compressed
summary for planning. The plan becomes a compressed spec for implementation. Each transition is an intentional
reduction.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Delegate&lt;&#x2F;strong&gt; is preventive. Hand work to sub-agents with isolated context windows so token sprawl never reaches the main
agent. Sub-agents explore different parts of a codebase in parallel; only their compressed summaries come back. The
sprawl never enters the main context at all.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;effective-context-engineering-for-ai-agents&quot;&gt;Anthropic&#x27;s guidance on context engineering&lt;&#x2F;a&gt;
formalises these as four strategies: write, select, compress, and isolate. My summarise maps to their compress; my
delegate maps to their isolate. The underlying principle is the same: every token in the context window competes for the
model&#x27;s attention, so be deliberate about what goes in.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;configuration-deterministic-vs-instructed&quot;&gt;Configuration: deterministic vs instructed&lt;&#x2F;h2&gt;
&lt;p&gt;A sharp distinction runs through my entire setup. Anything that can be checked mechanically is enforced via hooks or
automated verification steps, not by instructing the LLM. Linting, type checking, test runs, security scans, formatting.
These run automatically because the toolchain demands it. The LLM doesn&#x27;t need instructions to follow rules that are
enforced by the compiler.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.factory.ai&#x2F;using-linters-to-direct-agents&quot;&gt;Factory.ai&lt;&#x2F;a&gt; put this well: &quot;Agents write the code; linters write
the law.&quot; When you encode your architecture and standards directly into the code generation loop, the AI generates code,
gets automatic feedback, and iterates until clean. Lint passing becomes a proxy for &quot;conforms to architecture and best
practices.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Only non-deterministic behaviour controls go in instruction files like &lt;code&gt;CLAUDE.md&lt;&#x2F;code&gt;. Coding conventions that linters
don&#x27;t capture, architectural preferences, domain-specific patterns, interaction style, when to ask for clarification vs
proceed.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;...use the AskUserQuestion tool...&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;The single most important instruction I give the agent: ask me rather than assume. If the context isn&#x27;t enough, if
there&#x27;s a trade-off to resolve, if research turns up conflicting options, stop and ask. Most AI failures I&#x27;ve seen trace
back to the model filling gaps with confident guesses instead of flagging uncertainty. Prompting for this aggressively
has done more for my output quality than any other single instruction.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Skills&lt;&#x2F;strong&gt; extend these non-deterministic instructions with progressive disclosure. Modular prompt definitions loaded
only when relevant. They keep the base context lean and bring in specialised instructions on demand: commit conventions,
review criteria, planning templates, domain-specific patterns.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Hooks&lt;&#x2F;strong&gt; are how the deterministic side gets enforced. Claude Code fires hooks on events like file saves and tool
calls. I use them to enforce rules, so the agent gets immediate feedback without being told to check. The agent fixes
issues in the same loop. No instruction needed, no judgement required.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;MCP servers&lt;&#x2F;strong&gt; are powerful but hungry. Every tool description loaded into context competes for the same attention
budget as the actual task. Be selective. Only connect what you&#x27;ll actually use for the current work.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-review-bottleneck&quot;&gt;The review bottleneck&lt;&#x2F;h2&gt;
&lt;p&gt;AI has scaled code production. Human review capacity hasn&#x27;t changed. The research summarised in &lt;em&gt;Making Software&lt;&#x2F;em&gt; (
Oram &amp;amp; Wilson, O&#x27;Reilly) is consistent: roughly 400 lines per hour for effective review, with a hard wall at about 60
minutes of sustained attention. Beyond that, defect detection falls off a cliff.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;gitclear.com&#x2F;ai_assistant_code_quality_2025_research&quot;&gt;AI-generated code has a 41% higher churn rate&lt;&#x2F;a&gt; than
human-written code. And an eight-month study of 200 employees
found &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hbr.org&#x2F;2026&#x2F;02&#x2F;ai-doesnt-reduce-work-it-intensifies-it&quot;&gt;83% said AI increased their workload&lt;&#x2F;a&gt; through
scope expansion and dissolved work boundaries.&lt;&#x2F;p&gt;
&lt;p&gt;This is the central constraint. My strategies for working within it:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Right-size the unit of work.&lt;&#x2F;strong&gt; Size tasks to stay within both the AI&#x27;s context sweet spot and the human review budget.
These constraints push in the same direction, which is convenient.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Validate and evaluate, not just verify.&lt;&#x2F;strong&gt; Human attention is most valuable at the requirements and plan level, where
AI is weakest and the cost of errors is highest.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Make verification deterministic.&lt;&#x2F;strong&gt; Strongly typed languages, linters, automated tests, contract tests, security
scanners. These go from helpful to essential in AI-augmented workflows. They handle the mechanical correctness that
humans shouldn&#x27;t spend review time on.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Triage before deep review.&lt;&#x2F;strong&gt; Fast architectural pass first, then focus on risk areas: security, data validation, error
handling, concurrency.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Make the AI account for deviations.&lt;&#x2F;strong&gt; The deviation log from implementation turns review into targeted investigation
rather than line-by-line reading.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-no-review-shortcut&quot;&gt;The no-review shortcut&lt;&#x2F;h3&gt;
&lt;p&gt;There&#x27;s a growing school of thought that if you checked the plan and the code seems to work, you can skip review and
ship. I understand the appeal. You can&#x27;t inspect all the code the way we did before. The volume has changed.&lt;&#x2F;p&gt;
&lt;p&gt;But AI code goes wrong in different ways than human code did. The failure modes I described above, poor global
coherence, missing security controls, naming drift, these aren&#x27;t the kind of bugs that surface immediately in a demo.
They accumulate. Skipping review because the code appears to work ignores the problems that don&#x27;t surface until production: inconsistency, missing security controls, accumulated debt.&lt;&#x2F;p&gt;
&lt;p&gt;Then there&#x27;s compliance. Both SOC 2 and ISO 27001 have controls that require change management and peer review. The
purpose of code review isn&#x27;t just catching bugs. It&#x27;s establishing an auditable trail of authorisation. Could you
substitute automated testing, static analysis, and post-deploy monitoring as compensating controls? Maybe, in some
configurations. But you&#x27;d need to document that thoroughly, get buy-in from your auditor, and demonstrate it&#x27;s equally
effective. Most organisations would find it far easier to just do code reviews than justify the alternative to an
auditor.&lt;&#x2F;p&gt;
&lt;p&gt;The answer isn&#x27;t to skip review. It&#x27;s to scale it, focus it, and make it sustainable. Which is what everything above is
trying to do.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;cognitive-debt-and-operational-knowledge&quot;&gt;Cognitive debt and operational knowledge&lt;&#x2F;h2&gt;
&lt;p&gt;There&#x27;s a concept gaining traction called &lt;strong&gt;cognitive debt&lt;&#x2F;strong&gt;: the gap between the code your team ships and the code your
team actually understands. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;margaretstorey.com&#x2F;blog&#x2F;2026&#x2F;02&#x2F;09&#x2F;cognitive-debt&#x2F;&quot;&gt;Margaret Storey&lt;&#x2F;a&gt; framed it well,
and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;simonwillison.net&#x2F;2026&#x2F;Feb&#x2F;15&#x2F;cognitive-debt&#x2F;&quot;&gt;Simon Willison amplified it&lt;&#x2F;a&gt;. The research suggests AI
generates code 5-7x faster than humans can comprehend it.&lt;&#x2F;p&gt;
&lt;p&gt;I think the problem goes deeper than code comprehension. AI can build fast. You cannot compress the learning that comes
from running a system in production with real users over time.&lt;&#x2F;p&gt;
&lt;p&gt;An LLM will build what you ask for but won&#x27;t volunteer what you haven&#x27;t thought to ask for. And the things you haven&#x27;t
thought to ask about are exactly what matters most in production. Payment timeouts. Reservation expiry race conditions.
Idempotency edge cases. I&#x27;ve encountered all of these through operating my own systems, not through planning or design.&lt;&#x2F;p&gt;
&lt;p&gt;The gap between what you can build and what you can operate is where trust breaks. AI-augmented development widens this
gap by accelerating the build side without touching the operational learning side.&lt;&#x2F;p&gt;
&lt;p&gt;The practical response: build deliberately. Simple first. Real usage from day one. Complexity only as the system proves
itself. AI assists everywhere, but the human decides everywhere.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-ai-adds-genuine-value&quot;&gt;Where AI adds genuine value&lt;&#x2F;h2&gt;
&lt;p&gt;Not everywhere.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Where it works:&lt;&#x2F;strong&gt; Unstructured-to-structured transformation (parsing inconsistent data formats that would previously
require brittle regex or hand-coding). Natural language interfaces, always with a human in the loop. Code generation
with disciplined context management. Parallel research and exploration via sub-agents.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Where it doesn&#x27;t:&lt;&#x2F;strong&gt; Replacing deterministic workflows. There is no good reason to replace a reliable cron job,
webhook, or message queue with a non-deterministic alternative. Unsupervised autonomous operation: an AI agent with API
keys and shell access on a timer is a security incident waiting to happen. And anywhere robustness matters more than
novelty. If the existing solution works reliably, the burden of proof is on the AI replacement.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-this-compares-to-the-field&quot;&gt;How this compares to the field&lt;&#x2F;h2&gt;
&lt;p&gt;This process aligns with several emerging practices. The research-plan-implement workflow mirrors
what &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;humanlayer&#x2F;advanced-context-engineering-for-coding-agents&#x2F;blob&#x2F;main&#x2F;ace-fca.md&quot;&gt;Dex Horthy&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;effective-context-engineering-for-ai-agents&quot;&gt;Anthropic&lt;&#x2F;a&gt;,
and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;simonwillison.net&quot;&gt;Simon Willison&lt;&#x2F;a&gt; independently advocate. Context engineering as the central discipline
matches &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.trychroma.com&#x2F;context-rot&quot;&gt;Jeff Huber&#x27;s&lt;&#x2F;a&gt; framing. Plan-first, spec-driven development has
become the consensus position, replacing the early &quot;vibe coding&quot; enthusiasm.&lt;&#x2F;p&gt;
&lt;p&gt;Where I think this approach diverges:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Deterministic enforcement over LLM instructions.&lt;&#x2F;strong&gt; Most guides put everything in &lt;code&gt;CLAUDE.md&lt;&#x2F;code&gt; or similar files. I
reserve instruction files for genuinely non-deterministic guidance and enforce everything else through hooks and
tooling. If a machine can check it, a machine should enforce it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Operational knowledge as the constraint, not code generation speed.&lt;&#x2F;strong&gt; The industry conversation focuses on how fast
you can ship. I think the gap between build speed and operational understanding is the primary risk. Cognitive debt at
the code level is real, but the knowledge that only comes from production is the harder problem.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Collaboration over autonomy.&lt;&#x2F;strong&gt; The mainstream is moving towards more agent autonomy. I&#x27;m betting that the best
outcomes come from effective collaboration between AI and experienced, product-focused engineers. The human brings
domain knowledge, system-wide judgement, and operational experience. The AI brings speed, parallel exploration, and
tireless execution. Neither alone matches what they produce together. That&#x27;s what AI-augmented development means.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;metr.org&#x2F;blog&#x2F;2025-07-10-early-2025-ai-experienced-os-dev-study&#x2F;&quot;&gt;METR study&lt;&#x2F;a&gt; (mid-2025) found experienced
developers were 19% slower with AI on their own large codebases. This doesn&#x27;t match my experience, and I attribute the
difference to two things: context management discipline (most developers in the study used AI without structured
workflows), and the step change in model quality and tooling that arrived in December 2025.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hard-part-was-never-typing&quot;&gt;The hard part was never typing&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.svpg.com&#x2F;four-big-risks&#x2F;&quot;&gt;Marty Cagan&lt;&#x2F;a&gt; describes four product risks: value (will people use it?),
usability (can they figure it out?), feasibility (can we build it?), and business viability (does it work for the
business?). AI has reduced feasibility risk significantly. It has not reduced the others. If anything, by making it
cheaper to build, it shifts attention back to value risk: are we building the right thing?&lt;&#x2F;p&gt;
&lt;p&gt;The process in a sentence: assemble the right steps for the work, fresh context per step, compress between transitions,
enforce deterministically what you can, instruct the AI only on what requires judgement, and validate and evaluate
before you verify.&lt;&#x2F;p&gt;
&lt;p&gt;This keeps evolving. I&#x27;ll be wrong about parts of it in six months. But the underlying bet, that disciplined
collaboration between human judgement and AI capability beats either alone, is the one I&#x27;m most confident in.&lt;&#x2F;p&gt;
&lt;p&gt;If you&#x27;re working through this yourself, I&#x27;d genuinely love to hear what&#x27;s working for you. &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;&lt;strong&gt;Sources&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rmvDxxNubIg&quot;&gt;No Vibes Allowed: Solving Hard Problems in Complex Codebases&lt;&#x2F;a&gt; (Dex
Horthy, HumanLayer)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;humanlayer&#x2F;advanced-context-engineering-for-coding-agents&#x2F;blob&#x2F;main&#x2F;ace-fca.md&quot;&gt;Advanced Context Engineering for Coding Agents&lt;&#x2F;a&gt; (
HumanLayer)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.trychroma.com&#x2F;context-rot&quot;&gt;Context Rot: How Increasing Input Tokens Impacts LLM Performance&lt;&#x2F;a&gt; (Chroma
Research)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;effective-context-engineering-for-ai-agents&quot;&gt;Effective Context Engineering for AI Agents&lt;&#x2F;a&gt; (
Anthropic)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.factory.ai&#x2F;using-linters-to-direct-agents&quot;&gt;Using Linters to Direct Agents&lt;&#x2F;a&gt; (Factory.ai)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;veracode.com&#x2F;blog&#x2F;ai-generated-code-security-risks&#x2F;&quot;&gt;AI-Generated Code Security Risks&lt;&#x2F;a&gt; (Veracode, 2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;gitclear.com&#x2F;ai_assistant_code_quality_2025_research&quot;&gt;AI Assistant Code Quality 2025 Research&lt;&#x2F;a&gt; (GitClear)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.coderabbit.ai&#x2F;blog&#x2F;state-of-ai-vs-human-code-generation-report&quot;&gt;State of AI vs Human Code Generation&lt;&#x2F;a&gt; (
CodeRabbit, 2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;survey.stackoverflow.co&#x2F;2025&#x2F;&quot;&gt;2025 Developer Survey&lt;&#x2F;a&gt; (Stack Overflow)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hbr.org&#x2F;2026&#x2F;02&#x2F;ai-doesnt-reduce-work-it-intensifies-it&quot;&gt;AI Doesn&#x27;t Reduce Work, It Intensifies It&lt;&#x2F;a&gt; (HBR,
2026)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;margaretstorey.com&#x2F;blog&#x2F;2026&#x2F;02&#x2F;09&#x2F;cognitive-debt&#x2F;&quot;&gt;Cognitive Debt&lt;&#x2F;a&gt; (Margaret Storey)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;metr.org&#x2F;blog&#x2F;2025-07-10-early-2025-ai-experienced-os-dev-study&#x2F;&quot;&gt;Impact of AI on Experienced Developer Productivity&lt;&#x2F;a&gt; (
METR, 2025)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.svpg.com&#x2F;four-big-risks&#x2F;&quot;&gt;The Four Big Risks&lt;&#x2F;a&gt; (Marty Cagan, SVPG)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ghuntley.com&#x2F;loop&#x2F;&quot;&gt;The Ralph Loop&lt;&#x2F;a&gt; (Geoffrey Huntley)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.oreilly.com&#x2F;library&#x2F;view&#x2F;making-software&#x2F;9780596808310&#x2F;&quot;&gt;Making Software&lt;&#x2F;a&gt; (Oram &amp;amp; Wilson, O&#x27;Reilly)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Code Review in the Age of AI-Augmented Development</title>
        <published>2026-02-26T00:00:00+00:00</published>
        <updated>2026-02-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/code-review-ai-augmented-development/"/>
        <id>https://daz.is/blog/code-review-ai-augmented-development/</id>
        
        <content type="html" xml:base="https://daz.is/blog/code-review-ai-augmented-development/">&lt;p&gt;These days I spend much more of my development time reviewing code than writing it myself. I&#x27;ve also found myself
thinking more deeply about &lt;em&gt;what&lt;&#x2F;em&gt; to build, and how to specify it, before anything gets generated. I wrote recently
about &lt;a href=&quot;&#x2F;blog&#x2F;thinking-in-plans-not-code&#x2F;&quot;&gt;thinking in plans, not code&lt;&#x2F;a&gt; and how the leverage has shifted upstream to
research and planning. This post is about the other side: what happens downstream, when the code arrives and you have to
decide whether it&#x27;s right.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-human-constant&quot;&gt;The Human Constant&lt;&#x2F;h2&gt;
&lt;p&gt;There&#x27;s a chapter in &lt;em&gt;Making Software&lt;&#x2F;em&gt; (Oram &amp;amp; Wilson, O&#x27;Reilly) that summarises two studies on code review
effectiveness.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;code-review-ai-augmented-development&#x2F;code-review.png&quot; alt=&quot;Code Review Charts&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The first (Dunsmore 2000) mapped defect detection over time. Early in a review, the relationship is linear: roughly one
defect found every ten minutes. But around the 60-minute mark, there&#x27;s a sharp drop-off. Another ten minutes no longer
reliably turns up another defect. The brain hits a wall.&lt;&#x2F;p&gt;
&lt;p&gt;The second (Cohen 2006) looked at around 2,500 reviews and measured the effect of review speed. Below about 400 lines of
code per hour, defect density spreads naturally across reviews. Some code is simple with few defects, some is complex
with many. That spread is normal. Above 400-500 LOC&#x2F;hour, high defect density reviews virtually disappear. Not because
the defects aren&#x27;t there. Because the reviewer is moving too fast to find them.&lt;&#x2F;p&gt;
&lt;p&gt;The conclusion: at most one hour, at most 400 lines. Review more than that in a single sitting and you&#x27;re not going to
be effective.&lt;&#x2F;p&gt;
&lt;p&gt;These are cognitive limits. They haven&#x27;t changed. What&#x27;s changed is the volume of code arriving at your desk. An AI
coding assistant can produce in minutes what used to take a developer a day. The production side has scaled. The review
side is still bounded by the same brain it always was.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-coherence-problem&quot;&gt;The Coherence Problem&lt;&#x2F;h2&gt;
&lt;p&gt;AI-generated code tends to look fine in isolation. Each function is reasonable. Each module makes sense on its own. The
issues that are easy to miss are not bad code. It&#x27;s code that doesn&#x27;t make sense when you look at the bigger picture.&lt;&#x2F;p&gt;
&lt;p&gt;Three modules that solve overlapping problems in slightly different ways. Abstractions that don&#x27;t compose because they
were never designed together. Naming conventions that drift across files. Local coherence, but not global coherence.&lt;&#x2F;p&gt;
&lt;p&gt;In a traditional team, this kind of drift happens too, but it happens slowly. Over weeks, conversations and reviews
naturally surface the divergence. Someone says &quot;wait, didn&#x27;t we already solve this?&quot; and the team realigns. With AI, the
same mess can accumulate in an afternoon. The code all looks clean, so the signals that would normally trigger a course
correction don&#x27;t fire.&lt;&#x2F;p&gt;
&lt;p&gt;I had a telling example recently. I&#x27;d specified RustFS in my requirements for integration testing some S3 code. By the
time the AI-generated plan came back, that had quietly become Minio, the more widely known option. The substitution
looked perfectly reasonable at a glance. I missed it. One line in a plan that would have been a trivial correction
became an extra round of implementation to revert and swap out the dependency.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s the leverage problem in miniature. Catching it in the plan costs you a one-line edit. Catching it in the code
costs you a cycle of rework.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;right-size-the-unit-of-work&quot;&gt;Right-Size the Unit of Work&lt;&#x2F;h2&gt;
&lt;p&gt;One response to the review bottleneck is to control what you&#x27;re generating in the first place. I&#x27;ve found a rough
guideline: size tasks and planning phases so they fit within about 40% of the AI&#x27;s context window. That&#x27;s around where
context rot starts to bite, the gradual degradation in output quality as the context fills up.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s not a hard rule. But it serves two purposes. It keeps the AI&#x27;s output consistent and reliable. And it keeps each
chunk of output within a budget that a human can actually review properly, given the cognitive limits above.&lt;&#x2F;p&gt;
&lt;p&gt;Approval checkpoints need the same kind of sizing. Too many interrupts and you overwhelm the human reviewer with
constant context-switching. Too few and drift goes unchecked until it&#x27;s expensive to fix.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s no formula for this yet. Both sides, human and AI, are developing intuition for what works. It builds through
practice, is context-dependent, and not something you can read off a chart.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;review-plans-not-just-code&quot;&gt;Review Plans, Not Just Code&lt;&#x2F;h2&gt;
&lt;p&gt;Human attention is most valuable at the levels where AI is weakest: specifications, requirements, and architectural
coherence. The review question shifts from &quot;is this code correct?&quot; to &quot;is this spec complete?&quot; and &quot;does this still hang
together?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Going back to that RustFS example, if I had reviewed the plan more carefully, catching the substitution would have been
a one-line correction. Instead, I caught it after implementation, and it cost a rework cycle. The same principle applies
at every scale: the earlier you apply human judgment, the cheaper the correction.&lt;&#x2F;p&gt;
&lt;p&gt;Senior developers&#x27; experience and context matter most here. The ability to hold the bigger picture, to spot when a plan
is subtly drifting from what the system needs, to ask &quot;have we already solved this differently elsewhere?&quot; That&#x27;s the
work.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;make-verification-deterministic&quot;&gt;Make Verification Deterministic&lt;&#x2F;h2&gt;
&lt;p&gt;Every check you can make deterministic is a check you take off the human reviewer&#x27;s plate. Strongly typed languages
catch entire categories of error at compile time. Linters enforce consistency across files without anyone reading them.
Automated tests verify behaviour. Security scanners flag known patterns. Contract tests confirm that modules still talk
to each other correctly.&lt;&#x2F;p&gt;
&lt;p&gt;None of this is new. But in an AI-augmented workflow these tools go from helpful to essential. They&#x27;re what make the
review budget viable, because they remove whole classes of concern from the pile of things a human has to think about.
The more you push into deterministic verification, the smaller the surface area of judgment-dependent review becomes.&lt;&#x2F;p&gt;
&lt;p&gt;Never send an LLM (or a human) to do a linter&#x27;s job.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;triage-before-you-review&quot;&gt;Triage Before You Review&lt;&#x2F;h2&gt;
&lt;p&gt;When you do sit down to review, start with a fast architectural pass. Does the overall shape make sense? Do the modules
fit together? Are the boundaries in the right places? Only then focus your attention on the parts that carry the most
risk: security boundaries, data validation, error handling, concurrency.&lt;&#x2F;p&gt;
&lt;p&gt;AI can help here too as a first-pass. AI can triage to direct your attention rather than replace your judgment. Let it
flag the areas that look unusual or complex, then spend your limited review time on those.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;make-the-ai-account-for-its-decisions&quot;&gt;Make the AI Account for Its Decisions&lt;&#x2F;h2&gt;
&lt;p&gt;One practice I&#x27;ve found useful: after implementation, ask the AI to report where and why it deviated from the plan. It
won&#x27;t catch everything (it can be blind to its own substitutions), and it can tend to go into detail on where it
followed the plan, not deviated. If you can successfully prompt for this, it shifts the review from reading every line
looking for surprises to taking a more targeted approach as to where you focus your attention more.&lt;&#x2F;p&gt;
&lt;p&gt;Again, back to that RustFS-to-Minio example. This should not be something you have to spot by chance but rather
something that gets surfaced for you. The AI might tell you &quot;I used Minio instead of RustFS because the test container
support is more mature.&quot; Now you have a decision to make rather than a detail to catch. That&#x27;s a better use of your
attention.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-changes&quot;&gt;What Changes&lt;&#x2F;h2&gt;
&lt;p&gt;This requires a genuine shift in how senior developers think about their work. The high-leverage activity isn&#x27;t reading
code line by line any more. It&#x27;s writing specs tight enough that generation is constrained, reviewing plans before they
become implementations, and maintaining the coherent bigger picture that no individual AI context window can hold.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s harder to measure than lines reviewed. It&#x27;s harder to put in a standup update. But it&#x27;s where the bottleneck
actually is, and it&#x27;s where experienced developers can make the work better, or, by not doing it, let it quietly
degrade.&lt;&#x2F;p&gt;
&lt;p&gt;If any of this resonates, or if you&#x27;ve found approaches that work differently, &lt;a href=&quot;&#x2F;contact&quot;&gt;drop me a line&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Build Fast, Learn Slow</title>
        <published>2026-02-17T00:00:00+00:00</published>
        <updated>2026-02-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/build-fast-learn-slow/"/>
        <id>https://daz.is/blog/build-fast-learn-slow/</id>
        
        <content type="html" xml:base="https://daz.is/blog/build-fast-learn-slow/">&lt;aside class=&quot;update-callout&quot;&gt;
  &lt;span class=&quot;update-callout__label&quot;&gt;Update — 2026-03-04&lt;&#x2F;span&gt;
  &lt;p&gt;I outlined this post a while ago but never finished it. I&#x27;m posting it now because I think it&#x27;s interesting background
thinking to my &lt;a href=&quot;&#x2F;blog&#x2F;operational-debt&quot;&gt;operational debt&lt;&#x2F;a&gt; post.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;p&gt;If an AI-augmented engineer can build an app in a weekend, what happens to SaaS?&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m a tech lead for data and integrations at a SaaS company. But I also
run &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;zero-waste-tickets.com&quot;&gt;Zero Waste Tickets&lt;&#x2F;a&gt;, a small side project, with real users.&lt;&#x2F;p&gt;
&lt;p&gt;I see software from inside of a mature product, and the solo operator building from scratch.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-code-was-never-the-hard-part&quot;&gt;The code was never the hard part&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;ve rebuilt Zero Waste Tickets a few times. Each time the technology changed completely. Different stack, different
architecture, different approach. What carried over was the operational knowledge. Everything I&#x27;d learned about what
goes wrong.&lt;&#x2F;p&gt;
&lt;p&gt;AI coding tools are extraordinary. You can build in a weekend what used to take months. But you can&#x27;t &lt;em&gt;learn how to
operate&lt;&#x2F;em&gt; what you&#x27;ve built at the same pace. The code races ahead of your understanding. The gap between &quot;it works in a
demo&quot; and &quot;I&#x27;d trust it with someone&#x27;s money&quot; is where all the interesting problems live.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sounds-like-an-edge-case&quot;&gt;&quot;Sounds like an edge case&quot;&lt;&#x2F;h2&gt;
&lt;p&gt;I recently spoke to someone who had vibe-coded their own ticket-selling application. Looked great. I asked how they
prevented overselling. What happens when more people try to buy tickets than are available, all at the same time?&lt;&#x2F;p&gt;
&lt;p&gt;They hadn&#x27;t thought about it. &quot;Sounds like an edge case.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Overselling is not an edge case in a ticketing system. It&#x27;s &lt;em&gt;the&lt;&#x2F;em&gt; core integrity problem of the domain. That&#x27;s like
building a banking app and calling incorrect balances an edge case. But this person wasn&#x27;t careless or incompetent. They
just hadn&#x27;t encountered the problem yet because they hadn&#x27;t operated the system under real conditions. The LLM that
generated their code hadn&#x27;t raised it either, because they hadn&#x27;t thought to ask.&lt;&#x2F;p&gt;
&lt;p&gt;An LLM will build what you ask for. It won&#x27;t know exactly what things matter most in production.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-payment-timeout-lesson&quot;&gt;The payment timeout lesson&lt;&#x2F;h2&gt;
&lt;p&gt;In an earlier iteration of Zero Waste Tickets I had a payment error from a production edge case I hadn&#x27;t considered
during design. A user started buying tickets. They got to the payment step, where the bank sometimes asks for additional
verification. Then they walked away from their computer.&lt;&#x2F;p&gt;
&lt;p&gt;Completely reasonable human behaviour. But here&#x27;s what happened underneath: the system had reserved their tickets. After
a long period of inactivity it returned the reservation to the pool, as designed. Those tickets got bought by someone
else. Then, hours later, the original payment completed. The bank said yes, money moved, but the order was now invalid
because the tickets were gone. I had taken into account many cases, including declined transactions and payment
processing delays, but I hadn&#x27;t considered this particular case where the verification was delayed.&lt;&#x2F;p&gt;
&lt;p&gt;Three systems had each done the correct thing. But collectively it was broken. My reservation pool, my order state, and
Stripe&#x27;s payment intent all behaved correctly in isolation. The fix wasn&#x27;t just atomic updates to reservations and
orders, which I&#x27;d already been careful about across all three rebuilds. It was cleaning up the payment intent on
Stripe&#x27;s side when a reservation expired. I had thought about other delays in checkout, but nobody had ever walked away
from their screen for that long mid-verification.&lt;&#x2F;p&gt;
&lt;p&gt;I learned a similar lesson with idempotency keys. Get them wrong and you enable double payments. That sounds like a
technical detail until a real person sees two charges on their bank statement and loses trust in your system instantly.&lt;&#x2F;p&gt;
&lt;p&gt;Perhaps these are things you could anticipate by being smarter. But there will always be things you only learn by
operating the system with real users, real money, and real behaviour over years.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-you-re-actually-paying-for&quot;&gt;What you&#x27;re actually paying for&lt;&#x2F;h2&gt;
&lt;p&gt;This brings me back to the SaaS question. I&#x27;ve worked in many software organisations. A lot of engineering time goes to
handling complexity that only reveals itself at scale, over time, across thousands of different customer environments.&lt;&#x2F;p&gt;
&lt;p&gt;When you pay for a mature SaaS product, you&#x27;re not paying for code. Code is increasingly cheap. You&#x27;re paying for the
operational knowledge baked into that system over years. Every edge case discovered. Every failure mode handled. Every
&quot;sounds unlikely&quot; scenario that turned out to happen on the third Tuesday of every month.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.svpg.com&#x2F;four-big-risks&#x2F;&quot;&gt;Marty Cagan&lt;&#x2F;a&gt; talks about the cost of supporting a product as a key product
question. For my side project, this is critical: I have limited time, I want to keep it fun, and I need to be honest
about what I can actually operate and support. I&#x27;ve grown Zero Waste Tickets deliberately. Simple first. Real money from
day one. Added complexity only as the system proved itself. Invited other event organisers by word of mouth once I was
confident it could handle the responsibility.&lt;&#x2F;p&gt;
&lt;p&gt;That deliberate pace isn&#x27;t a weakness. It&#x27;s the discipline. Every feature I added, I could also &lt;em&gt;support&lt;&#x2F;em&gt;. I understood
the failure modes because I&#x27;d lived with the system long enough to encounter them.&lt;&#x2F;p&gt;
&lt;p&gt;This is what I was getting at in my post about &lt;a href=&quot;&#x2F;blog&#x2F;bot-protection-weekend-project&#x2F;&quot;&gt;overengineering a login form&lt;&#x2F;a&gt;.
Agentic coding decouples build speed from operational understanding. That&#x27;s both its power and its risk. You can
generate a system far more complex than you can comprehend, operate, or support. When something goes wrong, you won&#x27;t
have the mental model to diagnose it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-knowledge-that-doesn-t-compress&quot;&gt;The knowledge that doesn&#x27;t compress&lt;&#x2F;h2&gt;
&lt;p&gt;Is SaaS under threat from AI coding? For simple, low-stakes tools, probably. If the consequences of failure are a minor
inconvenience, generating something bespoke might make perfect sense.&lt;&#x2F;p&gt;
&lt;p&gt;But for anything involving money, trust, security, or reliability under pressure? The operational knowledge is the moat.
Not because AI can&#x27;t write the code. It can, and it keeps getting better. But because knowing &lt;em&gt;what&lt;&#x2F;em&gt; code to write
requires having encountered the problems that only show up in production, over time, with real users doing unpredictable
things.&lt;&#x2F;p&gt;
&lt;p&gt;Security is a another example. AI coding agents won&#x27;t typically add CSRF protection unless you specifically ask. How
many other security considerations are you not thinking to ask about? You don&#x27;t know. That&#x27;s the point.&lt;&#x2F;p&gt;
&lt;p&gt;The real value of mature software isn&#x27;t the codebase. It&#x27;s the deep domain knowledge that gets backed into the system
and its operation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-s-next&quot;&gt;What&#x27;s next&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m thinking a lot about where software goes as interactions become increasingly agent-to-agent rather than
human-to-human. Headless software where there&#x27;s no web UI at all, just APIs and agents talking to each other. That
changes what &quot;software&quot; even means, and I think it has implications for what matters most: security, monitoring,
measuring outcomes, improving over time. But that&#x27;s a post for another day.&lt;&#x2F;p&gt;
&lt;p&gt;For now, my advice to anyone building with AI coding tools: enjoy the speed. It&#x27;s genuinely transformative. But respect
the gap between what you can build and what you can operate. That gap is where your users get hurt.&lt;&#x2F;p&gt;
&lt;p&gt;If the thing you&#x27;re building handles someone else&#x27;s money or trust, maybe consider whether a conversation with someone
who&#x27;s been through the wars might be worth more than the monthly SaaS fee suggests.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;d love to hear from others who are thinking about this. &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>How to Overengineer a Login Form</title>
        <published>2026-02-16T00:00:00+00:00</published>
        <updated>2026-02-16T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/bot-protection-weekend-project/"/>
        <id>https://daz.is/blog/bot-protection-weekend-project/</id>
        
        <content type="html" xml:base="https://daz.is/blog/bot-protection-weekend-project/">&lt;p&gt;Yes, the irony of using a bot to build bot protection is not lost on me. But the experience taught me something.
Development hasn&#x27;t gotten easier with AI. It&#x27;s gotten more intense.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-postmark-incident&quot;&gt;The Postmark Incident&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a href=&quot;&#x2F;work&#x2F;zero-waste-tickets&#x2F;&quot;&gt;Zero Waste Tickets&lt;&#x2F;a&gt; is a side project of mine. Real users, real traffic, nothing massive.
The login flow is passwordless. You enter your email address and the app sends you a code. No passwords to manage, no
credentials to store. Simple.&lt;&#x2F;p&gt;
&lt;p&gt;Too simple, it turns out, if you don&#x27;t protect the form.&lt;&#x2F;p&gt;
&lt;p&gt;Last September, Postmark paused sending on my account. Polite email, no drama, but the message was clear: they&#x27;d spotted
anomalous sending patterns and flagged it as potential abuse.&lt;&#x2F;p&gt;
&lt;p&gt;You can see in the graph below that the site doesn&#x27;t have that many users. It&#x27;s a small side project in a closed beta so
only really used by friends and friends of friends. But, you can also see in that graph that email bounces had been
slowly increasing, then had a massive surge on 23rd September:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;bot-protection-weekend-project&#x2F;img.png&quot; alt=&quot;img.png&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The investigation didn&#x27;t take long. Bots had been hammering the login form. Every submission triggered an email with a
login code. Postmark&#x27;s message suggested that my API token might have been compromised. It hadn&#x27;t, it was just that my
basic bot protection had failed.&lt;&#x2F;p&gt;
&lt;p&gt;When I first built the site several years ago I had no protection. But, I noticed some fake login attempts in the logs,
so I implemented a basic honeypot field. A field that&#x27;s invisible to regular users, but bots fill in. I would detect the
field had a value and reject the submission. It had been working fine for years. But then the error rate started to
climb slowly. Then the honeypot stopped catching them, and the volume was enough to trip Postmark&#x27;s detection.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-weekend-fix&quot;&gt;The Weekend Fix&lt;&#x2F;h2&gt;
&lt;p&gt;I put Cloudflare in front of the site as an emergency response, which bought some time. But Cloudflare was having its
own reliability issues around then, and I&#x27;d rather not make my users&#x27; access to a side project contingent on a third
party. I like to keep dependencies minimal, and this is a project I use for learning and experimenting. I wanted to
understand the problem, not outsource it.&lt;&#x2F;p&gt;
&lt;p&gt;What I didn&#x27;t want was a captcha. Annoying UX, terrible privacy. I don&#x27;t want my users identifying motorbikes and fire
hydrants to log in.&lt;&#x2F;p&gt;
&lt;p&gt;I hate proof-of-work in principle, because of the wasted effort. It goes against the Zero Waste Tickets ethos. But I
needed something that would not get in the users&#x27; way but would trip up attackers or at least slow them down to the
point where it&#x27;s not worth it. I&#x27;m just adding it to the login form, as the rest of the site is protected by the login
session. So I figured the waste was minimal for just that one form if it worked to stop the spammer.&lt;&#x2F;p&gt;
&lt;p&gt;I built it by hand over a weekend. Before the server accepts a form submission, the browser has to solve a small
computational puzzle. A hash challenge running in a Web Worker so it wouldn&#x27;t block the UI. The server generates a
challenge, the client computes the answer, the server verifies it before processing the form. Nothing fancy. Rust on the
backend, a bit of JavaScript on the front.&lt;&#x2F;p&gt;
&lt;p&gt;It worked. The spam dropped off. Postmark was happy. I moved on.&lt;&#x2F;p&gt;
&lt;p&gt;That could have been the end of the story.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-descent&quot;&gt;The Descent&lt;&#x2F;h2&gt;
&lt;p&gt;A few months later I came back to the problem. Not because the proof of work stopped working. It&#x27;s still working fine. I
came back because I&#x27;m helping my wife get her site off Squarespace and she needs a contact form. That means bot
protection. So what if I extracted the bot protection from ZWT and put it into its own reusable service?&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s where things escalated.&lt;&#x2F;p&gt;
&lt;p&gt;Before AI, a &quot;weekend project&quot; for me was: implement a proof-of-work challenge on a login form. Research the approach,
write the hash function, wire up the Web Worker, build the server verification, test it, ship it. A focused,
self-contained piece of work.&lt;&#x2F;p&gt;
&lt;p&gt;After AI, a &quot;weekend project&quot; is: multiple challenge algorithms, a broker that selects the right one based on risk
signals, dynamic difficulty scaling, behavioural analysis. You&#x27;re halfway to accidentally reinventing Cloudflare.&lt;&#x2F;p&gt;
&lt;p&gt;Over-engineering used to be self-limiting because building things took time. You&#x27;d think &quot;what if I added dynamic
difficulty scaling?&quot; and then you&#x27;d put it on the ever-growing list of things to maybe get to later. That brake is gone.
With Claude Code, every one of those ideas is achievable in the time it used to take to build just one.&lt;&#x2F;p&gt;
&lt;p&gt;And &quot;weekend&quot; is generous too. It&#x27;s really a few hours here and there, squeezed in when I find time.&lt;&#x2F;p&gt;
&lt;p&gt;The answer isn&#x27;t to resist every impulse to overengineer. Some of that expanded scope is genuinely good. The challenge
broker is real architecture that solves a real problem. Dynamic difficulty is good protection.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;being-honest&quot;&gt;Being Honest&lt;&#x2F;h2&gt;
&lt;p&gt;Zero Waste Tickets doesn&#x27;t get enough traffic to justify any of this. The original proof of work solved the problem.&lt;&#x2F;p&gt;
&lt;p&gt;The Postmark incident was real. The learning was real. The increased potential is real. But so is the cognitive load.
Every &quot;what if&quot; that the AI makes achievable is another thing to evaluate, review, and maintain. The temptation to
overengineer isn&#x27;t free. It takes mental energy to resist it, and more energy when you don&#x27;t.&lt;&#x2F;p&gt;
&lt;p&gt;A recent HBR article by Ranganathan and Ye,
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hbr.org&#x2F;2026&#x2F;02&#x2F;ai-doesnt-reduce-work-it-intensifies-it&quot;&gt;&quot;AI Doesn&#x27;t Reduce Work—It Intensifies It&quot;&lt;&#x2F;a&gt;, found
exactly this. They studied 200 employees at a tech company over eight months. Nobody was asked to do more. But with AI
tools available, they voluntarily expanded their own workloads. The researchers described &quot;a sense of always juggling,
even as the work felt productive.&quot; That&#x27;s the feeling.&lt;&#x2F;p&gt;
&lt;p&gt;I had a realisation recently while in a supermarket. There was one person on the old-style tills, scanning items,
chatting to people, making the experience human. And there was one person on the self-scan checkouts dealing with twelve
tills at once, running from one to the next, helping frustrated customers whose machines weren&#x27;t working, in constant
demand. That&#x27;s what coding with AI agents is like. You&#x27;re not doing less. You&#x27;re supervising more, across more fronts,
with less downtime between decisions. Except nobody made you move to the self-scan area. You walked over there yourself,
because the machines looked faster.&lt;&#x2F;p&gt;
&lt;p&gt;Development hasn&#x27;t really gotten any easier with AI. It&#x27;s got more intense.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Context Engineering Is the Job</title>
        <published>2026-02-15T00:00:00+00:00</published>
        <updated>2026-02-15T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/context-engineering-is-the-job/"/>
        <id>https://daz.is/blog/context-engineering-is-the-job/</id>
        
        <content type="html" xml:base="https://daz.is/blog/context-engineering-is-the-job/">&lt;aside class=&quot;update-callout&quot;&gt;
  &lt;span class=&quot;update-callout__label&quot;&gt;Update — 2026-03-01&lt;&#x2F;span&gt;
  &lt;p&gt;This post has been superseded by &lt;a href=&quot;&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;&quot;&gt;How I Work with AI Coding Agents&lt;&#x2F;a&gt;. I&#x27;ve kept
it here rather than archiving it because I think it&#x27;s interesting to show how my thinking changed as I developed my
working processes. If you&#x27;re just after my latest compilation of how I&#x27;m working, you might want to check that more
recent post instead.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;p&gt;In my previous post on &lt;a href=&quot;&#x2F;blog&#x2F;ai-engineer&#x2F;&quot;&gt;AI engineering&lt;&#x2F;a&gt;, I talked a lot about how I think it&#x27;s largely about context
management. Keep the context clean. Stay in the smart zone. Don&#x27;t let the model guess.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve been researching this more, and I&#x27;ve got a lot of insights from listening to Jeff Huber. He&#x27;s the CEO of Chroma,
the company behind the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.trychroma.com&#x2F;context-rot&quot;&gt;context rot research&lt;&#x2F;a&gt; I referenced in that post.
He&#x27;s been across several podcasts making a case that I find compelling: context engineering isn&#x27;t just a technique. It&#x27;s
&lt;em&gt;the&lt;&#x2F;em&gt; discipline of building AI systems.&lt;&#x2F;p&gt;
&lt;p&gt;Huber comes at this from the search and retrieval side as he&#x27;s building infrastructure for agentic search. But the
principles he&#x27;s articulating extend well beyond search. I&#x27;ve been finding them just as applicable to agentic coding, and
I suspect they hold for any system where an LLM needs the right information at the right time.&lt;&#x2F;p&gt;
&lt;p&gt;I spent some time pulling together his key ideas from
a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;podcasts.apple.com&#x2F;us&#x2F;podcast&#x2F;episode-65-the-rise-of-agentic-search&#x2F;id1610318868?i=1000741941190&quot;&gt;Vanishing Gradients episode&lt;&#x2F;a&gt;
and a few other appearances. Here&#x27;s what stuck with me.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;stop-saying-rag&quot;&gt;Stop saying RAG&lt;&#x2F;h2&gt;
&lt;p&gt;Huber refuses to use the term &quot;RAG.&quot; His argument is that it conflates three separate things (retrieval, augmentation,
and generation) into one. The term that&#x27;s becoming standard instead is &lt;strong&gt;context engineering&lt;&#x2F;strong&gt;: the discipline of
figuring out what should be in the context window for any given LLM generation step. It&#x27;s a better name because it
describes the actual job. And it gives the work the status it deserves. This isn&#x27;t prompt fiddling, it&#x27;s engineering.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;In a traditional MVC CRUD app, your business logic is encoded in controllers. In an AI app, your business logic is
encoded in context.&lt;&#x2F;p&gt;
&lt;p&gt;— Jeff Huber&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;The key architectural decisions in an AI system are about what the model sees and when. This follows from the insight
that an LLM is stateless, and its output depends entirely on its input. And the performance comes from what we build
around it to support feeding it the right thing. I&#x27;m starting to think about agentic AI systems as having four key
concerns: model choice, the agentic harness, context engineering, and orchestration. But of those four, context
engineering is what we&#x27;re talking about here.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;two-loops&quot;&gt;Two loops&lt;&#x2F;h2&gt;
&lt;p&gt;Huber breaks context engineering into an inner loop and an outer loop.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;strong&gt;inner loop&lt;&#x2F;strong&gt; is what goes into the context window right now, for this specific generation step. You have N
candidate chunks of information and Y available slots. The job is to curate from potentially millions of candidates down
to the handful that matter for this exact moment.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;strong&gt;outer loop&lt;&#x2F;strong&gt; is how you get better at the inner loop over time. Build, test, deploy, monitor, iterate. The classic
software development cycle, applied to context quality.&lt;&#x2F;p&gt;
&lt;p&gt;This framing is useful because it separates two different kinds of work. The inner loop is the mechanics of assembling
context, including retrieval, filtering, reranking, prompt construction. The outer loop is about measurement, feedback,
and systematic improvement. It&#x27;s easy to focus almost entirely on the inner loop and barely touch the outer.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;gather-then-glean&quot;&gt;Gather, then glean&lt;&#x2F;h2&gt;
&lt;p&gt;For the inner loop, Huber describes a two-stage process:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Stage one: gather.&lt;&#x2F;strong&gt; Cast a wide net. Maximise recall. Use semantic search, keyword search, metadata filters, API
calls, conversation history. You&#x27;ll grab irrelevant things. That&#x27;s fine.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Stage two: glean.&lt;&#x2F;strong&gt; Cull the candidates to the minimal set that actually matters. Rerank using cross-encoders,
reciprocal rank fusion, or increasingly just LLMs directly. Go from a few hundred down to the 20 or so that belong in
the context window.&lt;&#x2F;p&gt;
&lt;p&gt;The two stages optimise for different things. Gather is optimising for not missing anything important. Glean optimises
for not including anything distracting. You need both.&lt;&#x2F;p&gt;
&lt;p&gt;Huber&#x27;s framing here is search-specific, but the underlying problem applies everywhere. It&#x27;s about context assembly and
selecting the right parts from a larger pool. For agentic coding, I&#x27;m still doing this fairly manually as I learn what
works. It&#x27;s something I&#x27;m actively working on improving and automating.&lt;&#x2F;p&gt;
&lt;p&gt;Huber also makes an important point here that the most dangerous information isn&#x27;t the obviously irrelevant stuff. It&#x27;s
the information that &lt;em&gt;looks&lt;&#x2F;em&gt; relevant but isn&#x27;t, for some subtle reason. That&#x27;s what causes the model to confidently go
down the wrong path. Tight gleaning protects against this.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-outer-loop-is-key&quot;&gt;The outer loop is key&lt;&#x2F;h2&gt;
&lt;p&gt;The outer loop is where the most real leverage is. You observe what your system actually does, compare it to what it
should have done, and feed that back into how you build context next time. Without this, every change is a guess. With
it, you&#x27;re doing engineering.&lt;&#x2F;p&gt;
&lt;p&gt;Huber&#x27;s version of this, coming from search, is the &lt;strong&gt;golden dataset&lt;&#x2F;strong&gt;. He recommends a spreadsheet of query-information
pairs that define what your system should retrieve for given inputs. His advice for creating one is disarmingly simple:
get your team together for an evening, buy some pizzas, spend a few hours writing pairs for every use case you can think
of. Then improve it over time by studying what users actually query, analysing what succeeded and what failed, and
wiring the results into CI.&lt;&#x2F;p&gt;
&lt;p&gt;For agentic coding, I&#x27;m finding the outer loop looks different but follows the same shape. It&#x27;s about studying where the
agent followed the plan and where it diverged, what context was missing when it made a bad decision, what assumptions it
hallucinated because the right information wasn&#x27;t in the window. Each of those failure cases becomes a lesson that feeds
back into how I structure research, write plans, and assemble context for the next session.
The &lt;a href=&quot;&#x2F;blog&#x2F;ai-engineer&#x2F;&quot;&gt;research-plan-implement cycle&lt;&#x2F;a&gt; I described previously is really an inner loop. The outer loop
is how that cycle gets refined through experience.&lt;&#x2F;p&gt;
&lt;p&gt;The underlying principle is the same regardless of domain: you need a way to measure whether your context engineering is
actually getting better. Huber calls the gap between demo and production &quot;alchemy.&quot; The outer loop is what turns it into
engineering.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;keeping-context-under-control&quot;&gt;Keeping context under control&lt;&#x2F;h2&gt;
&lt;p&gt;Agentic workflows pile up tokens through multi-step interactions. You need strategies for keeping context windows clean.
In my experience, there are two: &lt;strong&gt;summarise&lt;&#x2F;strong&gt; and &lt;strong&gt;delegate&lt;&#x2F;strong&gt;. They look similar but work at different points.&lt;&#x2F;p&gt;
&lt;p&gt;Summarising deals with context that&#x27;s already accumulated. As a conversation grows, you extract what matters and discard
the rest. This is what Dex called &lt;a href=&quot;&#x2F;blog&#x2F;ai-engineer&#x2F;&quot;&gt;intentional compaction&lt;&#x2F;a&gt;. The research-plan-implement cycle I&#x27;ve
written about is essentially this. Each phase produces a compressed artefact that replaces the sprawl of the previous
phase. It&#x27;s reactive. When the context has grown, you compact it.&lt;&#x2F;p&gt;
&lt;p&gt;Delegating prevents the tokens from entering the main context in the first place. You hand work to a sub-agent that
operates in its own isolated context window. It does the messy, token-heavy exploration, and only a concise result
crosses back into the parent. Huber frames this as encapsulation, borrowing from software engineering, and I think
that&#x27;s exactly right. The same principle as keeping functions small and interfaces narrow, applied to context windows.
The sprawl never reaches the main agent at all.&lt;&#x2F;p&gt;
&lt;p&gt;I use both. Sub-agents explore different parts of a codebase in parallel, each in a fresh context. Only their compressed
summaries come back. And within a conversation, I compact between phases rather than letting history accumulate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;scaffolding-has-a-shelf-life&quot;&gt;Scaffolding has a shelf life&lt;&#x2F;h2&gt;
&lt;p&gt;Huber makes a strong argument that the scaffolding around LLMs should get &lt;em&gt;simpler&lt;&#x2F;em&gt; as models improve, not more complex.
Teams that build elaborate workarounds for model weaknesses end up maintaining dead weight when the next model doesn&#x27;t
have those weaknesses. He points out that Manus has been re-architected five times since March 2024. Anthropic regularly
strips out Claude Code&#x27;s agent scaffolding as models get more capable.&lt;&#x2F;p&gt;
&lt;p&gt;I can relate to this directly. A few years ago at Peppy, I wasn&#x27;t building the RAG system itself, but I was building
components around it and could see what was going on. There was a lot of scaffolding in place to compensate for model
limitations. Looking back, much of that could be dramatically simplified now. I&#x27;ve always aimed to build things out of
smaller, replaceable parts. I haven&#x27;t always managed to achieve that in practice. But that instinct serves you well
here. If you expect the scaffolding to have a shelf life, composability isn&#x27;t just good engineering, it&#x27;s mandatory.&lt;&#x2F;p&gt;
&lt;p&gt;But I don&#x27;t really want the model generating information. I want it synthesising from what&#x27;s been provided. Which brings
it right back to context engineering: make sure the right information is in the window.&lt;&#x2F;p&gt;
&lt;p&gt;Huber also argues that the cost of rebuilding is dramatically lower now, so teams should lean into impermanence. I&#x27;ve
done some experiments with natural language specs and rebuilding parts of systems, and I can see the direction of
travel. But I still think we&#x27;re in the early days of learning how building with these tools actually works. I don&#x27;t want
to claim more confidence than I have on that one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-m-taking-from-this&quot;&gt;What I&#x27;m taking from this&lt;&#x2F;h2&gt;
&lt;p&gt;These are the main insights I&#x27;m taking from Huber that are influencing my own work now:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Name the primitives.&lt;&#x2F;strong&gt; Don&#x27;t say &quot;RAG.&quot; Be explicit about the components that make up context engineering. Retrieval,
filtering, reranking, context assembly, evaluation are separate concerns you can reason about, measure, and improve
independently.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Close the outer loop.&lt;&#x2F;strong&gt; Find a way to measure context quality over time. &quot;Does this feel better?&quot; isn&#x27;t good enough.
Instrumentation matters, and so does evaluation against known data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Respect context rot.&lt;&#x2F;strong&gt; I was already doing this for coding, but it applies to every AI system. Tight, structured
contexts beat maximal windows. Always.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Embrace the rebuild.&lt;&#x2F;strong&gt; Stop trying to build permanent AI infrastructure. Build for the current model generation, keep
things simple enough to rip out, and accept that the next model might change everything.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Start simple, stay simple.&lt;&#x2F;strong&gt; Exhaust prompt engineering and basic workflows before reaching for agents and complex
retrieval. The premature complexity trap is real, and it&#x27;s expensive.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;There&#x27;s a lot more in
the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;podcasts.apple.com&#x2F;us&#x2F;podcast&#x2F;episode-65-the-rise-of-agentic-search&#x2F;id1610318868?i=1000741941190&quot;&gt;full episode&lt;&#x2F;a&gt;.
Huber goes deep on hybrid search tradeoffs, evaluation practices, and the demo-to-production gap. Worth the listen if
you&#x27;re building anything that puts information in front of an LLM.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m curious whether others are finding the same things. Is context engineering the frame you&#x27;re using, or something
different? &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;. I&#x27;d love to hear what&#x27;s working for you.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;&lt;strong&gt;Sources&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;podcasts.apple.com&#x2F;us&#x2F;podcast&#x2F;episode-65-the-rise-of-agentic-search&#x2F;id1610318868?i=1000741941190&quot;&gt;Vanishing Gradients Ep. 65: The Rise of Agentic Search&lt;&#x2F;a&gt; (
Jeff Huber with Hugo Bowne-Anderson)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.trychroma.com&#x2F;context-rot&quot;&gt;Context Rot: How Increasing Input Tokens Impacts LLM Performance&lt;&#x2F;a&gt; (Chroma
Research)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.latent.space&#x2F;p&#x2F;chroma&quot;&gt;Latent Space: RAG is Dead, Context Engineering is King&lt;&#x2F;a&gt; (
Jeff Huber)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Thinking in Plans, Not Code</title>
        <published>2026-02-12T00:00:00+00:00</published>
        <updated>2026-02-12T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/thinking-in-plans-not-code/"/>
        <id>https://daz.is/blog/thinking-in-plans-not-code/</id>
        
        <content type="html" xml:base="https://daz.is/blog/thinking-in-plans-not-code/">&lt;aside class=&quot;update-callout&quot;&gt;
  &lt;span class=&quot;update-callout__label&quot;&gt;Update — 2026-03-01&lt;&#x2F;span&gt;
  &lt;p&gt;This post has been superseded by &lt;a href=&quot;&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;&quot;&gt;How I Work with AI Coding Agents&lt;&#x2F;a&gt;. I&#x27;ve kept
it here rather than archiving it because I think it&#x27;s interesting to show how my thinking changed as I developed my
working processes. If you&#x27;re just after my latest compilation of how I&#x27;m working, you might want to check that more
recent post instead.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;h2 id=&quot;thinking-in-code&quot;&gt;Thinking in Code&lt;&#x2F;h2&gt;
&lt;p&gt;The thing I realise with AI-assisted coding is just how quickly I would previously have jumped into writing code. That&#x27;s
how I would have naturally thought about and explored problems. Open the editor, start sketching something out, let the
shape of the solution emerge through the act of building it.&lt;&#x2F;p&gt;
&lt;p&gt;With AI coding, I realise we have far more leverage at the research and planning phases than we do at implementation.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m having to train myself to spend more time planning each change. It feels a bit like procrastination. But I can also
see how valuable it is.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-gap&quot;&gt;The Gap&lt;&#x2F;h2&gt;
&lt;p&gt;There&#x27;s a gap between high-level planning and implementation. In my experience, that gap used to be bridged inside the
developer&#x27;s head. You&#x27;d read the requirements, form a mental model, and start coding. The translation from &quot;what needs
to happen&quot; to &quot;how it happens in code&quot; was implicit, happening almost unconsciously as you typed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;thinking-in-plans&quot;&gt;Thinking in Plans&lt;&#x2F;h2&gt;
&lt;p&gt;What works now is different. It&#x27;s a progressive refinement: requirements, to plan, to detailed plan, to even more
detailed plan, to &lt;em&gt;maybe this plan is finally detailed enough&lt;&#x2F;em&gt;, to let&#x27;s go implement. Each layer adds specificity and
reduces ambiguity before the AI ever writes a line of code.&lt;&#x2F;p&gt;
&lt;p&gt;This is new territory for people who think in code.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;not-big-design-up-front&quot;&gt;Not Big Design Up Front&lt;&#x2F;h2&gt;
&lt;p&gt;I know what this sounds like. But it&#x27;s not Big Design Up Front. BDUF happens over weeks or months, tries to anticipate
everything, and produces documents that are outdated before implementation begins.&lt;&#x2F;p&gt;
&lt;p&gt;What I&#x27;m describing is a continuous refinement within a single flow of work. For a substantial build, that planning
phase might be a couple of days working with the LLM in different personas to stress-test requirements for security,
performance, implementability, consistency, compliance. Then refining from requirements to high-level plan, and down
through multiple levels of increasingly concrete detail. Implementation then happens across multiple sessions, working
through the detailed plans and checking the code at each point.&lt;&#x2F;p&gt;
&lt;p&gt;A few days to plan and build a system that would have taken weeks before. That&#x27;s the difference.&lt;&#x2F;p&gt;
&lt;p&gt;You&#x27;re taking the next piece of work and progressively adding detail until execution becomes so obvious that the AI
can&#x27;t really get it wrong.&lt;&#x2F;p&gt;
&lt;p&gt;And there&#x27;s a new skill emerging here that I don&#x27;t think has a name yet: developing intuition for the right size for a
piece of work for an AI to build in one go, and for the level of detail needed to make execution almost inevitable. Too
vague and the AI makes bad assumptions. Too large and it loses coherence. Get the granularity and specificity right, and
the code practically writes itself. And the quality is higher.&lt;&#x2F;p&gt;
&lt;p&gt;That intuition is something you can only build through experience. Nobody&#x27;s teaching it. We&#x27;re all just stumbling into
it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;feasibility-risk-isn-t-dead&quot;&gt;Feasibility Risk Isn&#x27;t Dead&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.svpg.com&#x2F;four-big-risks&#x2F;&quot;&gt;Marty Cagan&lt;&#x2F;a&gt; calls out four types of risks in software development:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;value risk (whether customers will buy it or users will choose to use it)&lt;&#x2F;li&gt;
&lt;li&gt;usability risk (whether users can figure out how to use it)&lt;&#x2F;li&gt;
&lt;li&gt;feasibility risk (whether our engineers can build what we need with the time, skills, and technology we have)&lt;&#x2F;li&gt;
&lt;li&gt;business viability risk (whether this solution also works for the various aspects of our business)&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;There&#x27;s a position gaining traction in product circles that feasibility risk (which used to be one of the biggest risks
in product development) is now irrelevant. That value risk is what matters most.&lt;&#x2F;p&gt;
&lt;p&gt;AI development has made many more things viable from an implementation perspective. There are things you can build now
that would have been impractical two years ago.&lt;&#x2F;p&gt;
&lt;p&gt;But I&#x27;m pretty convinced that feasibility risk is still a factor. I&#x27;m happy to be wrong about this, but unless you&#x27;re
guiding the AI from an engineering and developer point of view, you&#x27;re going to end up with an unmaintainable, expensive
mess. The AI can produce working code quickly. But working code that&#x27;s maintainable, performant, secure, and fits
coherently into an existing system are very different things.&lt;&#x2F;p&gt;
&lt;p&gt;The feasibility risk hasn&#x27;t disappeared. It&#x27;s shifted. It used to be &quot;can we build this?&quot; Now it&#x27;s &quot;can we plan this so
it gets built &lt;em&gt;well&lt;&#x2F;em&gt;?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;And that still requires someone who thinks like an engineer.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m getting good results with this approach, but I have a feeling I may be erring on the side of caution with overly
detailed plans. I know vibe coders would dismiss a lot of this. Where are you at with this? &lt;a href=&quot;&#x2F;contact&quot;&gt;Drop me a line&lt;&#x2F;a&gt;
if you want to discuss.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Maybe All Intelligence is Artificial</title>
        <published>2026-02-11T00:00:00+00:00</published>
        <updated>2026-02-11T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/all-intelligence-is-artificial/"/>
        <id>https://daz.is/blog/all-intelligence-is-artificial/</id>
        
        <content type="html" xml:base="https://daz.is/blog/all-intelligence-is-artificial/">&lt;aside class=&quot;update-callout&quot;&gt;
  &lt;span class=&quot;update-callout__label&quot;&gt;Warning&lt;&#x2F;span&gt;
  &lt;p&gt;This post is a little different to my usual technical blog posts. I asked Claude to review this post,
and this is what it said:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;It doesn&#x27;t survive close scrutiny as an argument because it relies on loaded definitions, unexamined
metaphysics, and a narrative so tidy it papers over the messiness of actual history and biology.&quot;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;You have been warned.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;p&gt;This isn&#x27;t my usual territory. I spend most of my time building things with code, not writing about fungal networks and
Mesopotamian irrigation. But during a quiet moment in nature, an aphorism surfaced: &quot;maybe all intelligence is
artificial?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;I sat with it for a long while. Slowly, the whole trajectory of human civilisation started to look like a single,
accelerating story of separation from source, driven by &quot;intelligence&quot;.&lt;&#x2F;p&gt;
&lt;p&gt;I don&#x27;t have this fully worked out. But let me try and explain...&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;Intelligence manipulates. It abstracts, optimises, and solves. It builds tools, constructs models, and generates
language. It lets a mathematician write a proof, an octopus unscrew a jar from the inside, or a machine produce a
coherent paragraph. Intelligence is impressive, and it&#x27;s useful, but it&#x27;s always &lt;em&gt;doing&lt;&#x2F;em&gt; something. It operates on the
world.&lt;&#x2F;p&gt;
&lt;p&gt;Intelligence doesn&#x27;t have to mean disconnection. An ape is intelligent, as are many animals, but it&#x27;s rooted. It
participates in the ecology it acts on. For this discussion, what I&#x27;m calling artificial isn&#x27;t intelligence itself, but
the disconnected intelligence.&lt;&#x2F;p&gt;
&lt;p&gt;Wisdom is different. What I mean by wisdom here is not good judgment or the accumulation of experience. I mean something
older and less personal, a kind of knowing that doesn&#x27;t separate itself from what it knows. Wisdom doesn&#x27;t operate. It
participates. In traditions that recognise a universal consciousness, wisdom is the capacity to be in connection with
all living things. It&#x27;s not constructed. It&#x27;s received, and ancient.&lt;&#x2F;p&gt;
&lt;p&gt;Consider a forest. Beneath the soil, fungal networks connect the roots of trees across vast distances, distributing
nutrients from the strong to the struggling, mediating the boundary between life and death. It&#x27;s tempting to call this
intelligence. But there is a difference. The fungal network has no model of the forest. It doesn&#x27;t stand apart from the
system it serves. It&#x27;s the forest&#x27;s connective tissue. This is not intelligence. This is wisdom.&lt;&#x2F;p&gt;
&lt;p&gt;I realise I&#x27;m using these words in a slightly unusual way. Wisdom usually means something like good judgment born from
experience, and intelligence can be rooted. But I need handles for these two very different modes of knowing, and these
are the closest words I have. Bear with me.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;When action stays connected to its source, it builds within ecology. It builds the way a beaver builds a dam or a coral
builds a reef. It participates in the living system that feeds back into the whole. The dam becomes habitat. The reef
becomes an ecosystem. The construction does not stand apart from nature. It&#x27;s nature building itself. It&#x27;s life
perpetuating itself through action that remains in relationship with source.&lt;&#x2F;p&gt;
&lt;p&gt;Even early human construction had this quality. Vernacular architecture built from local materials that would return to
the soil. Indigenous land management that used fire, rest, and rotation to increase the vitality of ecosystems rather
than extract from them. Traditional agriculture that worked within the rhythms of living systems rather than overriding
them. These were intelligence still tethered to wisdom.&lt;&#x2F;p&gt;
&lt;p&gt;The separation happens gradually. It starts when intelligence begins to build things that no longer participate in the
living systems they depend on.&lt;&#x2F;p&gt;
&lt;p&gt;Agriculture scales up and becomes monoculture. Irrigation feeds civilisations but salts the soil beneath them. Cities
emerge as environments constructed entirely by intelligence, abstracted from the ecology that sustains them. Economies
develop that treat ecosystems as inputs to be optimised. At each stage intelligence is creating results that move it
further from source.&lt;&#x2F;p&gt;
&lt;p&gt;This is not new. It&#x27;s a trajectory as old as civilisation itself. Mesopotamian irrigation systems fed the first great
civilisations but left behind salt-crusted earth that hasn&#x27;t recovered in four thousand years. The land that gave us
writing, mathematics, and agriculture is now desert. Intelligence built something extraordinary there, and what it built
destroyed what it was built on.&lt;&#x2F;p&gt;
&lt;p&gt;The deforestation of the Mediterranean basin. The drainage of wetlands, the enclosure of commons, the industrial
conversion of landscapes into machinery. Each era builds further from ecology, and each era&#x27;s intelligence is more
sophisticated and more severed from source.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;The attempt to live according to the notion that the fragments are really separate is, in essence, what has
led to the growing series of extremely urgent crises that is confronting us today.&lt;&#x2F;p&gt;
&lt;p&gt;-- David Bohm, Theoretical Physicist&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;What accelerates is not just the power of intelligence but the depth of its disconnection. Early agriculture was
intelligence one step removed from the source. Industrial manufacturing was several steps removed. A global financial
system that algorithmically trades futures on crop yields while the soil those crops grow in erodes? That is
intelligence so far from the source that it can destroy itself without noticing.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;But the connection to source hasn&#x27;t been entirely severed. It&#x27;s been marginalised.&lt;&#x2F;p&gt;
&lt;p&gt;Permaculture designs food systems by observing how ecosystems actually work. Not by imposing intelligence onto land, but
learning from the land&#x27;s own patterns of renewal. Indigenous ecological traditions, many of them thousands of years old
and still practised, manage landscapes through relationship rather than extraction. They don&#x27;t treat the living world as
a problem to be solved. They participate in it. And meditation, prayer, deep sustained attention to the natural world
are all practices of reconnecting intelligence to source. Of slowing down enough to receive what cannot be computed.&lt;&#x2F;p&gt;
&lt;p&gt;These are not relics. They are living proof that intelligence can remain in relationship with wisdom. That the
trajectory of disconnection, however old and however powerful, is not inevitable. The path back exists.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;Artificial intelligence is just the latest stage of this trajectory. It&#x27;s not a break from the pattern. It&#x27;s the
pattern&#x27;s culmination.&lt;&#x2F;p&gt;
&lt;p&gt;The difference is that AI is intelligence with no connection to source. Human intelligence retains at least the
possibility of reconnection to source. A person can be intelligent &lt;em&gt;and&lt;&#x2F;em&gt; wise. This is what contemplative traditions
have always been about. Quieting the mind so it can receive what it cannot construct.&lt;&#x2F;p&gt;
&lt;p&gt;AI has no such possibility. A large language model can produce text that mimics insight, arrange words in patterns that
resemble understanding, but it does so without any contact with universal consciousness, without participation in the
living fabric that sustains and connects all things. It isn&#x27;t intelligence that has lost its connection to source. It is
intelligence that never had one.&lt;&#x2F;p&gt;
&lt;p&gt;And it&#x27;s fast. Intelligence disconnected from source was already dangerous when it moved at the speed of human thought.
Wisdom requires patience. AI has no such constraint. It moves at the speed of computation, making decisions that affect
living systems at a pace that leaves no room for wisdom.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;The environmental crisis and the crisis of artificial intelligence are not two separate problems. They are the
culmination of the same trajectory.&lt;&#x2F;p&gt;
&lt;p&gt;When intelligence separated from ecology, it built civilisations that could not sustain themselves without degrading the
living systems they depended on. When intelligence separated further, it built industrial economies that accelerated
that degradation to a planetary scale. Now, intelligence has separated so completely that it&#x27;s building new forms of
itself. Forms of intelligence that have no memory of the source, no relationship to the living world, and no capacity
for wisdom.&lt;&#x2F;p&gt;
&lt;p&gt;The acceleration is not merely technological. We are building disconnected minds and entrusting them with decisions that
affect the fabric of biological existence on Earth. We are building minds without understanding what a mind is for.&lt;&#x2F;p&gt;
&lt;p&gt;The most important question we can ask about any intelligence, biological or digital, is not how powerful it is, but
whether it has any relationship to source. And, through artificial intelligence, we are about to find out what it looks
like when the answer is no.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;&lt;em&gt;If any of this resonates, or if you think I&#x27;ve got it completely wrong, then I&#x27;d genuinely love to hear from you.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>AI Engineer or Sloperator?</title>
        <published>2026-02-04T00:00:00+00:00</published>
        <updated>2026-02-04T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/ai-engineer/"/>
        <id>https://daz.is/blog/ai-engineer/</id>
        
        <content type="html" xml:base="https://daz.is/blog/ai-engineer/">&lt;aside class=&quot;update-callout&quot;&gt;
  &lt;span class=&quot;update-callout__label&quot;&gt;Update — 2026-03-01&lt;&#x2F;span&gt;
  &lt;p&gt;This post has been superseded by &lt;a href=&quot;&#x2F;blog&#x2F;how-i-work-with-ai-coding-agents&#x2F;&quot;&gt;How I Work with AI Coding Agents&lt;&#x2F;a&gt;. I&#x27;ve kept
it here rather than archiving it because I think it&#x27;s interesting to show how my thinking changed as I developed my
working processes. If you&#x27;re just after my latest compilation of how I&#x27;m working, you might want to check that more
recent post instead.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;p&gt;Last year I was using AI Chat and Copilot but hadn&#x27;t gone all in on coding agents yet. I was seeing AI slop everywhere.
But in Dec 2025 everything changed and I &lt;a href=&quot;&#x2F;blog&#x2F;rethinking-ai&#x2F;&quot;&gt;reevaluated&lt;&#x2F;a&gt;
my whole approach.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;When the facts change, I change my mind.&quot;
-- &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;quoteinvestigator.com&#x2F;2011&#x2F;07&#x2F;22&#x2F;keynes-change-mind&#x2F;&quot;&gt;John Maynard Keynes&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;The facts changed. So did I.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-paradox&quot;&gt;The paradox&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;ai-engineer&#x2F;img_5.png&quot; alt=&quot;img_5.png&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;I looked for research and found the conflicting data.&lt;&#x2F;p&gt;
&lt;p&gt;Controlled studies consistently show 20–30% individual coding speed improvements [1]. But research also shows that 45%
of AI-generated code contains security vulnerabilities [2], AI code has a 41% higher churn rate, revised or deleted
within two weeks [3], and in the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;survey.stackoverflow.co&#x2F;2025&#x2F;&quot;&gt;2025 Stack Overflow Developer Survey&lt;&#x2F;a&gt;, 66% of
developers said they suffered a productivity overhead from not-quite-right AI code.&lt;&#x2F;p&gt;
&lt;p&gt;You&#x27;re faster, but the output quality creates drag that can eat those gains and then some.&lt;&#x2F;p&gt;
&lt;p&gt;The question isn&#x27;t whether AI coding tools are useful. They clearly are. The question is whether you end up as an AI
engineer or a &lt;em&gt;sloperator&lt;&#x2F;em&gt;. You are producing more code, faster, but is most of it slop?&lt;&#x2F;p&gt;
&lt;p&gt;For greenfield projects, simple standalone apps, small, well-defined scopes, it&#x27;s much easier to get good results from
AI. In a few hours you can ship what would have taken days before. But for complex tasks in 10-year-old legacy codebases
with intricate dependencies and undocumented conventions, that&#x27;s where the slop factory kicks in.&lt;&#x2F;p&gt;
&lt;p&gt;The models are getting better, and learning the right techniques makes the difference.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-fundamental-constraint&quot;&gt;The fundamental constraint&lt;&#x2F;h2&gt;
&lt;p&gt;The insight that underpins everything else: &lt;strong&gt;LLMs are stateless&lt;&#x2F;strong&gt;. They have no memory between requests. The only thing
they have to work with is the context you give them.&lt;&#x2F;p&gt;
&lt;p&gt;Context is everything. Output quality is directly bounded by context quality.&lt;&#x2F;p&gt;
&lt;p&gt;I think about context quality across four dimensions, a framing from Dex&#x27;s &quot;No Vibes Allowed&quot; talk [5] that crystallised
much of what I&#x27;d been stumbling towards. I&#x27;ve mixed in my own experience and pulled from other sources [6][7][8], but
Dex&#x27;s framework is the backbone of this post.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Correctness&lt;&#x2F;strong&gt;: is everything in the context actually accurate? One wrong assumption about how the auth system works
and everything downstream is built on sand.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Completeness&lt;&#x2F;strong&gt;: is anything important missing? If the model doesn&#x27;t know about a critical constraint, it can&#x27;t account
for it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Size&lt;&#x2F;strong&gt;: is the context all signal with minimal noise? This one is counterintuitive, and it&#x27;s the most important.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Trajectory&lt;&#x2F;strong&gt;: does the shape and flow of the conversation help the model reason well? A meandering back-and-forth
produces worse results than a clean, focused prompt.&lt;&#x2F;p&gt;
&lt;p&gt;Get all four right and you get great output. Any one of them off and you get slop.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;context-rot-and-the-smart-zone&quot;&gt;Context rot and the smart zone&lt;&#x2F;h2&gt;
&lt;blockquote&gt;
&lt;p&gt;As you use more tokens the model can pay attention to less and can reason less effectively&lt;&#x2F;p&gt;
&lt;p&gt;— Jeff Huber (Chroma)&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;At first it might seem counterintuitive that more context usually means worse output.&lt;&#x2F;p&gt;
&lt;p&gt;As you fill up the context window with more tokens, the model&#x27;s ability to pay attention to all of it decreases. Its
reasoning quality degrades. I&#x27;ve seen this called &lt;em&gt;context rot&lt;&#x2F;em&gt; [6]. Performance peaks when the context is focused and
clean, then drops off.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;ai-engineer&#x2F;img.png&quot; alt=&quot;img.png&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;After about 40% context window utilisation, you&#x27;re in diminishing returns territory. Some call this the &quot;dumb zone&quot; [7].&lt;&#x2F;p&gt;
&lt;p&gt;This explains so much of the AI slop problem. People stuff context windows full, thinking more information means better
results. The opposite is true.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s a DJ analogy [8]: &quot;if you&#x27;re redlining, you ain&#x27;t headlining.&quot; In audio engineering, redlining means pushing
your levels past the maximum. The signal clips, distorts, sounds terrible. The pros keep headroom. They stay within the
limits. That&#x27;s where the clean sound is.&lt;&#x2F;p&gt;
&lt;p&gt;Same with LLMs. Stay in the smart zone. Keep headroom.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;ai-engineer&#x2F;img_2.png&quot; alt=&quot;img_2.png&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution-research-plan-implement&quot;&gt;The solution: Research, Plan, Implement&lt;&#x2F;h2&gt;
&lt;p&gt;If cramming context is the problem, intentional compaction is the solution. And the shape of that solution will look
familiar to anyone who&#x27;s been engineering for a while: research first, plan second, build third. That&#x27;s not a new idea.
What&#x27;s new is why it matters so much more with AI. When a human developer skips the planning phase, they still carry
implicit context in their head. When an AI agent skips it, it has nothing. The model only knows what&#x27;s in the context
window. If it&#x27;s not in the context window then it will be influenced by it&#x27;s training data and that&#x27;s where
hallucinations start to creep in.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;daz.is&#x2F;blog&#x2F;ai-engineer&#x2F;img_3.png&quot; alt=&quot;img_3.png&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The framework I&#x27;m using has three main phases, each in a separate conversation with a fresh context window. The output
of each phase is a compressed artefact that becomes the input for the next.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;phase-1-research&quot;&gt;Phase 1: Research&lt;&#x2F;h3&gt;
&lt;p&gt;Start with high context: lots of code, lots of files. Explore the codebase. Navigate the file structure, read key
modules, trace data flows. Identify patterns: coding conventions, architectural decisions, existing abstractions. Map
dependencies: what touches what, where the integration points are.&lt;&#x2F;p&gt;
&lt;p&gt;The output is a compressed markdown summary. Not a raw dump of files. A focused, curated document that captures what
matters. AI subagents are excellent at this. You can spin them up to explore different parts of the codebase in parallel
and consolidate the results.&lt;&#x2F;p&gt;
&lt;p&gt;This is the highest-leverage phase. A hallucinated assumption about how your authentication works isn&#x27;t a code-level
error. It&#x27;s a research-level error. Everything built on top of it will be wrong.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;phase-2-plan&quot;&gt;Phase 2: Plan&lt;&#x2F;h3&gt;
&lt;p&gt;Take the compressed research and produce an execution blueprint. Every step numbered, sequential, unambiguous. Include
explicit test criteria: how to verify each step works. Include actual code snippets from the existing codebase to anchor
the implementation to real patterns. Think through edge cases and risks.&lt;&#x2F;p&gt;
&lt;p&gt;The goal: a plan so detailed that the dumbest model in the world won&#x27;t screw it up.&lt;&#x2F;p&gt;
&lt;p&gt;One bad step in the plan can produce a hundred lines of wrong code. Review plans with the same rigour you review code.
Maybe more.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;phase-3-implement&quot;&gt;Phase 3: Implement&lt;&#x2F;h3&gt;
&lt;p&gt;This should be the simplest phase. If research and planning are done well, implementation becomes almost mechanical.&lt;&#x2F;p&gt;
&lt;p&gt;Feed the AI only the plan and the specific files it needs to modify. Phase large tasks into chunks, each with a fresh
context window. Test after each step. Build intuition for task size versus context consumption.&lt;&#x2F;p&gt;
&lt;p&gt;Don&#x27;t dump the entire codebase. Don&#x27;t let one conversation run forever. Don&#x27;t skip testing. Don&#x27;t assume more
information means better output.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern across all three phases: context goes down at each stage while specificity goes up.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hierarchy-of-leverage&quot;&gt;The hierarchy of leverage&lt;&#x2F;h2&gt;
&lt;p&gt;Not all errors are created equal [5].&lt;&#x2F;p&gt;
&lt;p&gt;A bad line of code is a bad line of code. You&#x27;ll probably catch it in review. A bad step in a plan could produce a
hundred lines of wrong code before anyone notices. A fundamental misunderstanding of how the system works, a
research-level error, means your entire feature is built on a wrong assumption.&lt;&#x2F;p&gt;
&lt;p&gt;Don&#x27;t just review code. Review plans. Review research. Catch errors before they multiply.&lt;&#x2F;p&gt;
&lt;p&gt;You can use the AI itself as a reviewer, but know where it&#x27;s reliable. At the code level, it&#x27;s excellent: syntax errors,
logic bugs, missing edge cases. At the plan level, it&#x27;s moderately useful. It can spot gaps and inconsistencies but
still needs human judgement. At the research level, it&#x27;s less reliable because it requires the kind of deep system
understanding the model may not have.&lt;&#x2F;p&gt;
&lt;p&gt;Human review is non-negotiable at the research and plan level. AI review amplifies your coverage at the code level.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;don-t-let-it-guess&quot;&gt;Don&#x27;t let it guess&lt;&#x2F;h2&gt;
&lt;p&gt;The default behaviour of most models is to be helpful. When they encounter ambiguity, they make a plausible-sounding
decision and keep going.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s the most dangerous failure mode.&lt;&#x2F;p&gt;
&lt;p&gt;A compiler error is obvious. A failed test is obvious. A hallucinated line of code, you&#x27;ll probably catch it in review.
But a confidently wrong architectural choice buries itself in your codebase and surfaces weeks later. A hallucinated
assumption about how your auth system works poisons everything downstream.&lt;&#x2F;p&gt;
&lt;p&gt;The fix: force the model to ask rather than guess. In every prompt, explicitly instruct it to only use the provided
context and ask for clarification when anything is unclear. Use an AGENTS.md or CLAUDE.md file to set interaction-style
rules that get included automatically in every prompt. Set it once, applies everywhere.&lt;&#x2F;p&gt;
&lt;p&gt;Yes, sometimes this means the AI agent asks too many questions. I&#x27;d rather it ask &quot;does this service use JWT or session
tokens?&quot; than confidently guess wrong and build an entire feature on a bad assumption.&lt;&#x2F;p&gt;
&lt;p&gt;An interruption is cheap. A hallucinated assumption is expensive.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;configuration-as-free-performance&quot;&gt;Configuration as free performance&lt;&#x2F;h2&gt;
&lt;p&gt;A quick note on setup: research shows 10–20% improvement in output quality from getting configuration right. That&#x27;s free
performance you&#x27;re leaving on the table if you skip it.&lt;&#x2F;p&gt;
&lt;p&gt;Three areas matter. &lt;strong&gt;AGENTS.md &#x2F; CLAUDE.md&lt;&#x2F;strong&gt; defines your coding conventions, project-specific rules, and interaction
style. It&#x27;s included in every request automatically. &lt;strong&gt;MCPs&lt;&#x2F;strong&gt; (Model Context Protocol servers) are powerful
integrations, but they eat context, so be selective and disable what you&#x27;re not using in this session. &lt;strong&gt;Skills&lt;&#x2F;strong&gt;
are progressive disclosure: specialised knowledge provided only when needed, not loaded all at once.&lt;&#x2F;p&gt;
&lt;p&gt;Everything is a context budget decision. Every MCP, every file, every instruction consumes tokens from your smart zone
budget.&lt;&#x2F;p&gt;
&lt;p&gt;For structured prompts, I use six elements: &lt;strong&gt;role&lt;&#x2F;strong&gt; (sets expertise level), &lt;strong&gt;goal&lt;&#x2F;strong&gt; (defines success criteria),
&lt;strong&gt;context&lt;&#x2F;strong&gt; (constrains the solution), &lt;strong&gt;format&lt;&#x2F;strong&gt; (specifies deliverables), &lt;strong&gt;examples&lt;&#x2F;strong&gt; (anchors to your patterns), and
&lt;strong&gt;constraints&lt;&#x2F;strong&gt; (makes security and performance requirements explicit). You don&#x27;t always need all six, but for complex
work, the more explicit you are, the less the model guesses.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;better-models-amplify-everything&quot;&gt;Better models amplify everything&lt;&#x2F;h2&gt;
&lt;p&gt;The models are getting better fast. Opus 4.5 was a genuine step change for coding. But a better model doesn&#x27;t fix bad
context management. It just produces more confident, more fluent slop.&lt;&#x2F;p&gt;
&lt;p&gt;These practices become &lt;em&gt;more&lt;&#x2F;em&gt; valuable as models improve because you&#x27;re amplifying a stronger base capability.&lt;&#x2F;p&gt;
&lt;p&gt;Clean context plus a great model equals extraordinary results. Noisy context plus a great model equals expensive slop.&lt;&#x2F;p&gt;
&lt;p&gt;Same principle. The hard work here is in context management, not in writing more code.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;will-this-age&quot;&gt;Will this age?&lt;&#x2F;h2&gt;
&lt;p&gt;An obvious question: context windows are getting bigger, tools are getting smarter at managing context automatically,
agents can search and index codebases on their own. Will any of this matter in a year?&lt;&#x2F;p&gt;
&lt;p&gt;Some of the specifics won&#x27;t. The 40% utilisation threshold will shift. The manual three-phase workflow will probably get
automated. The tooling around AGENTS.md and MCPs will evolve or be replaced entirely.&lt;&#x2F;p&gt;
&lt;p&gt;But I think the underlying principles hold. &quot;Be intentional about what the model knows&quot; is a constraint of attention,
not just of window size. A million-token context window doesn&#x27;t help if the model is paying equal attention to
everything and nothing is prioritised. &quot;Review at the highest leverage point&quot; is just good engineering.
&quot;Don&#x27;t let it guess&quot; is about the nature of language models, not the current generation of them.&lt;&#x2F;p&gt;
&lt;p&gt;The tools will change. The thinking won&#x27;t. Or at least, that&#x27;s my bet.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-difference&quot;&gt;The difference&lt;&#x2F;h2&gt;
&lt;p&gt;The gap between drowning in AI slop and shipping quality code comes down to four things:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Intentional context management.&lt;&#x2F;strong&gt; Understand the smart zone. Keep context clean, compressed, and focused. Less is
more.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Research, Plan, Implement.&lt;&#x2F;strong&gt; Separate your phases. Compress between each one. Fresh context windows. Specificity up,
noise down.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Human review at the highest-leverage points.&lt;&#x2F;strong&gt; Don&#x27;t just review code. Review plans and research. Catch errors before
they multiply.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Never let it guess.&lt;&#x2F;strong&gt; Force the model to ask questions. An interruption is cheap. A wrong assumption is expensive.&lt;&#x2F;p&gt;
&lt;p&gt;These aren&#x27;t complicated ideas. They&#x27;re intentional ones. And that intentionality is what separates AI engineers from
sloperators.&lt;&#x2F;p&gt;
&lt;p&gt;If you&#x27;re using AI coding tools and have found practices that work for you, or if you think I&#x27;ve got it wrong, I&#x27;d love
to hear about it. Drop me a line.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;&lt;strong&gt;References&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[1] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;arc.dev&#x2F;talent-blog&#x2F;impact-of-ai-on-code&#x2F;&quot;&gt;The Impact of AI on Code&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[2] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;veracode.com&#x2F;blog&#x2F;ai-generated-code-security-risks&#x2F;&quot;&gt;AI-Generated Code Security Risks&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[3] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;gitclear.com&#x2F;ai_assistant_code_quality_2025_research&quot;&gt;AI Assistant Code Quality 2025 Research&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[4] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;stackoverflow.blog&#x2F;2026&#x2F;01&#x2F;02&#x2F;a-new-worst-coder-has-entered-the-chat-vibe-coding-without-code-knowledge&#x2F;&quot;&gt;A New Worst Coder Has Entered the Chat&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[5] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rmvDxxNubIg&quot;&gt;No Vibes Allowed: Solving Hard Problems in Complex Codebases&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[6] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.trychroma.com&#x2F;context-rot&quot;&gt;Context Rot: How Increasing Input Tokens Impacts LLM Performance&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[7] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;humanlayer&#x2F;advanced-context-engineering-for-coding-agents&#x2F;blob&#x2F;main&#x2F;ace-fca.md&quot;&gt;Getting AI to Work in Complex Codebases&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;[8] &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ghuntley.com&#x2F;redlining&#x2F;&quot;&gt;If You&#x27;re Redlining, You Ain&#x27;t Headlining&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Moltbot: The Bored Ape of Integration Patterns</title>
        <published>2026-01-28T00:00:00+00:00</published>
        <updated>2026-01-28T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/moltbot/"/>
        <id>https://daz.is/blog/moltbot/</id>
        
        <content type="html" xml:base="https://daz.is/blog/moltbot/">&lt;p&gt;I&#x27;m not an AI sceptic, &lt;a href=&quot;&#x2F;blog&#x2F;rethinking-ai&quot;&gt;that ship has sailed&lt;&#x2F;a&gt;. Not since realising that Opus 4.5, with strict
supervision, can produce better code than me. But Moltbot feels like the AI equivalent of primate-themed NFTs. Hand an
AI agent your API keys? Email accounts? And shell access? Then let it wake itself up via cron to &quot;do things&quot; on your
behalf. What could possibly go wrong?&lt;&#x2F;p&gt;
&lt;p&gt;Remember the Bored Apes and the whole NFT hype? A toy novelty dressed up as inevitable innovation, with &#x27;everyone will
be doing this soon&#x27; narratives obscuring real debate over whether anyone should be.&lt;&#x2F;p&gt;
&lt;p&gt;My day job is tech lead for a data and integrations team. Earlier in my career I worked with the Kendraio Foundation on
interoperability, building systems that help data flow between services in structured, reliable ways. This background
gives me a particular lens on what Moltbot is doing. That lens says: &lt;strong&gt;we&#x27;ve already solved many of these problems, and
the solutions were boring on purpose&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;When I see people giving an LLM access to their email and shell, scheduling it to wake up autonomously and &quot;handle
things&quot;, I don&#x27;t see innovation. I see the same old integration problems we&#x27;ve been solving for decades, now wrapped in
non-determinism and a security nightmare.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hidden-and-not-so-hidden-costs&quot;&gt;The Hidden (and not so hidden) Costs&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Financial costs are unpredictable.&lt;&#x2F;strong&gt; Token costs vary wildly based on input complexity. A simple task might cost
pennies; parsing a complex email thread could burn through dollars. Failed attempts consume the budget with zero output.
Debugging costs tokens because the AI has to examine its own errors. DataCamp estimates $10–150&#x2F;month depending on
usage, but that assumes things work. Multi-attempt workflows? Nobody&#x27;s budgeting for &quot;tried 47 times before success.&quot;
Traditional integration costs are predictable. These are not.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Security costs are severe.&lt;&#x2F;strong&gt; The security picture here is genuinely alarming. Researchers have found hundreds of
exposed Moltbot instances on the open internet. API keys, OAuth tokens, conversation histories: all accessible to anyone
who knows where to look. In one demonstrated attack, a researcher sent a prompt injection via email to a Moltbot
instance. The AI read the email, believed it was legitimate instructions, and forwarded the user&#x27;s last five emails to
an attacker address. It took five minutes.&lt;&#x2F;p&gt;
&lt;p&gt;The core issue isn&#x27;t implementation bugs (though there are plenty). It&#x27;s architectural. You&#x27;re handing API keys to an
unsupervised agent that processes untrusted input. Credentials are stored in plaintext JSON and Markdown files. Audit
trails become &quot;the AI decided to&quot;, which isn&#x27;t going to fly in an enterprise environment where SOC2 or ISO compliance
matters. Credential rotation becomes a nightmare when you don&#x27;t know what the AI might have done with them. One security
firm found that 22% of their enterprise customers have employees actively using Moltbot, likely without IT approval.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Operational costs compound silently.&lt;&#x2F;strong&gt; Context drift over extended runs means the AI gradually loses the thread of
what it&#x27;s supposed to be doing. Non-deterministic behaviour creates chaos in systems expecting predictability. Errors
compound without human checkpoints. You don&#x27;t discover the problem until the damage is done.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-already-works&quot;&gt;What Already Works&lt;&#x2F;h2&gt;
&lt;p&gt;Traditional integration patterns already solve deterministic workflow problems. They do it well. They&#x27;ve done it well
for years.&lt;&#x2F;p&gt;
&lt;p&gt;Structured data transformations. Predictable API orchestration. Webhook-based triggers. Scheduled data syncs. Message
queues. ETL pipelines. These aren&#x27;t exciting, but they&#x27;re deterministic, debuggable, and auditable. When something
fails, you know what failed and why. When something succeeds, you can reproduce it.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s no good reason to replace deterministic workflows with non-deterministic alternatives. &quot;The AI handles it&quot; is
not an improvement over &quot;the cron job runs this integration at 3am.&quot; The cron job will do the same thing every time. The
AI might do something different because the phrasing of an email changed or because it hallucinated a slightly different
interpretation of your intent.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-ai-actually-adds-value&quot;&gt;Where AI Actually Adds Value&lt;&#x2F;h2&gt;
&lt;p&gt;This isn&#x27;t an argument against AI in integrations. It&#x27;s an argument for using AI where it actually helps.&lt;&#x2F;p&gt;
&lt;p&gt;AI enables automations that weren&#x27;t previously possible. The key is recognising what those are.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Unstructured data processing.&lt;&#x2F;strong&gt; Parsing inconsistent PDFs, emails, and documents. Extracting structured information
from variable-format vendor data. Handling inputs that don&#x27;t conform to expected schemas. Before LLMs, this required
either brittle regex hell or expensive human processing. Now there&#x27;s a middle option.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Natural language interfaces.&lt;&#x2F;strong&gt; Processing natural language inputs as workflow triggers. Intent classification for
routing. Human-friendly interaction layers where the human is genuinely in the loop. &quot;Hey, can you pull last week&#x27;s
sales data and send it to finance?&quot; is a valuable capability when a human is there to confirm the action before it
happens.&lt;&#x2F;p&gt;
&lt;p&gt;The key distinction: AI for unstructured-to-structured transformation, not for deterministic execution that traditional
tools already handle reliably.&lt;&#x2F;p&gt;
&lt;p&gt;Some use cases may benefit from non-deterministic execution layers. Genuinely novel situations where the appropriate
action isn&#x27;t predictable. But this shouldn&#x27;t be the default approach. It should be the exception, applied carefully,
with human oversight.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-pragmatic-path-forward&quot;&gt;The Pragmatic Path Forward&lt;&#x2F;h2&gt;
&lt;p&gt;Before adding AI to an integration, ask two questions:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Does this undermine robustness?&lt;&#x2F;strong&gt; Consider the costs (financial, security, operational) against what you&#x27;re gaining.
If you&#x27;re replacing a reliable cron job with an AI agent because it&#x27;s cooler, you&#x27;re making your system worse. If you&#x27;re
adding attack surface, non-determinism, and unpredictable costs to a workflow that worked fine without them, reconsider.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Does this unlock something previously impossible?&lt;&#x2F;strong&gt; Specifically: does this handle unstructured data or natural
language in ways traditional integrations can&#x27;t? If yes, there&#x27;s potentially real value. If no, you&#x27;re adding complexity
for its own sake.&lt;&#x2F;p&gt;
&lt;p&gt;If the answer is yes to the first question and no to the second, stop.&lt;&#x2F;p&gt;
&lt;p&gt;The actual opportunity here is boring. Keep using traditional integrations where they work. Add AI where it unlocks new
capabilities through unstructured data handling. Don&#x27;t replace proven patterns with fragile ones just because AI is
available.&lt;&#x2F;p&gt;
&lt;p&gt;We don&#x27;t need to choose between &quot;AI for everything&quot; and &quot;AI for nothing.&quot; We need engineering judgment about where it
actually improves outcomes.&lt;&#x2F;p&gt;
&lt;p&gt;The Moltbot excitement feels like people rediscovering that automation is useful and then choosing the least reliable
form of automation available. Yes, you can give an LLM access to your shell and let it wake up via cron to &quot;help.&quot; You
can also write the shell script. The shell script will work the same way every time.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Rethinking My Position on AI</title>
        <published>2026-01-14T00:00:00+00:00</published>
        <updated>2026-01-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/rethinking-ai/"/>
        <id>https://daz.is/blog/rethinking-ai/</id>
        
        <content type="html" xml:base="https://daz.is/blog/rethinking-ai/">&lt;aside class=&quot;update-callout&quot;&gt;
  &lt;span class=&quot;update-callout__label&quot;&gt;Update — 2026-02-07&lt;&#x2F;span&gt;
  &lt;p&gt;Re-phrased some parts to be clearer and to add the important Nolan Lawson
insight &quot;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;nolanlawson.com&#x2F;2026&#x2F;02&#x2F;07&#x2F;we-mourn-our-craft&#x2F;&quot;&gt;We Mourn Our Craft&lt;&#x2F;a&gt;&quot;.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It&quot;
-- &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;quoteinvestigator.com&#x2F;2017&#x2F;11&#x2F;30&#x2F;salary&#x2F;&quot;&gt;Upton Sinclair&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;When the facts change, I change my mind.&quot;
-- &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;quoteinvestigator.com&#x2F;2011&#x2F;07&#x2F;22&#x2F;keynes-change-mind&#x2F;&quot;&gt;John Maynard Keynes&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Last year I was using AI chat and Copilot but hadn&#x27;t gone all in on coding agents yet. I was seeing AI slop everywhere
and saw code review bots fixating on trivia or getting completely confused. The tools were useful for research and code
completion, but agents felt like more hype than substance. And they were. They genuinely weren&#x27;t ready.&lt;&#x2F;p&gt;
&lt;p&gt;Then December happened.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time
in the &quot;progress as usual&quot; way, but specifically this last December. There are a number of asterisks but imo coding
agents basically didn&#x27;t work before December and basically work since -- the models have significantly higher quality,
long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is
extremely
disruptive to the default programming workflow.&lt;&#x2F;p&gt;
&lt;p&gt;-- &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;karpathy.github.io&#x2F;&quot;&gt;Andrej Karpathy&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Karpathy nails it. Agents before December were a novelty. After, they actually work. The models got better at holding
context over long tasks, at recovering from mistakes, at understanding what you actually want. It wasn&#x27;t gradual. It was
a step change.&lt;&#x2F;p&gt;
&lt;p&gt;Two things pushed me further than I expected.&lt;&#x2F;p&gt;
&lt;p&gt;A friend who wouldn&#x27;t shut up about spec-driven development. He explained his workflow in detail, and I pushed back to
defend my craft. Kept doing things my way.&lt;&#x2F;p&gt;
&lt;p&gt;Then I hit a patch of ice on my bicycle. Broke my elbow, messed up my shoulder, ribs, and wrist. Suddenly, I couldn&#x27;t
type properly. I was forced to lean on AI agents far more heavily than I&#x27;d planned, and dictation software for
everything else. No choice but to figure out how to make them actually work. (The dictation, by the way, turned out to
be amazing. I&#x27;m not sure I&#x27;ll go back.)&lt;&#x2F;p&gt;
&lt;p&gt;The timing was fortunate, if you can call breaking your elbow fortunate. Opus 4.5 had just dropped, and it&#x27;s shockingly
good at coding. I haven&#x27;t written any code since the accident, but my output has gone up, not down. That&#x27;s a strange
thing to sit with.&lt;&#x2F;p&gt;
&lt;p&gt;Nolan Lawson put it well in &quot;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;nolanlawson.com&#x2F;2026&#x2F;02&#x2F;07&#x2F;we-mourn-our-craft&#x2F;&quot;&gt;We Mourn Our Craft&lt;&#x2F;a&gt;&quot;: &quot;The worst
fact about these tools is that they work.&quot; He frames it as grief, not conversion. He had a mortgage, a family, and
junior colleagues strapping on &quot;bazooka-powered jetpacks.&quot; The mind-change came not from an argument won, but from a
reality that refused to wait for his permission.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s OK to mourn our craft. I&#x27;ve permitted myself to do so. But I&#x27;m learning to build a new craft on the bones of the
old one.&lt;&#x2F;p&gt;
&lt;p&gt;The effectiveness of these tools opens up a huge dilemma. Opting out entirely means giving up any influence over how
this goes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-contradiction-i-m-sitting-with&quot;&gt;The Contradiction I&#x27;m Sitting With&lt;&#x2F;h2&gt;
&lt;p&gt;Here&#x27;s where I&#x27;m at: I need to adapt to stay relevant. Accumulated expertise doesn&#x27;t evaporate overnight, but the speed
of change is faster than I expected. At the same time, I genuinely believe the current structure of AI development is
concentrating power, replicating the worst patterns of Big Tech, and creating environmental costs we&#x27;re not seriously
reckoning with.&lt;&#x2F;p&gt;
&lt;p&gt;What do you do when you need to use tools that you think are contributing to harmful outcomes?&lt;&#x2F;p&gt;
&lt;aside class=&quot;aside-callout&quot;&gt;
  &lt;span class=&quot;aside-callout__label&quot;&gt;Aside&lt;&#x2F;span&gt;
  &lt;p&gt;I should also flag the irony of leaning on Fuller below. His techno-optimism, &quot;doing more with less,&quot; designing our way
out of systemic problems, is exactly the rhetoric Silicon Valley adopted to justify moving fast and breaking things. The
same language I use about abundance and shared infrastructure could come straight from a startup pitch deck. Fuller
isn&#x27;t wrong, but those ideas get co-opted easily. The power concentration, the environmental costs, the broken
economics. Those are the parts of this article I stand behind most.&lt;&#x2F;p&gt;

&lt;&#x2F;aside&gt;
&lt;h2 id=&quot;what-would-bucky-do&quot;&gt;What would Bucky do?&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;ve been thinking about this through the lens of Buckminster Fuller, partly because I&#x27;ve been reading his work
recently, and partly because he spent a lot of time thinking about exactly this kind of bind. Fuller studied what he
called the &quot;Great Pirates&quot;, powerful maritime traders who operated across national boundaries, accumulated comprehensive
knowledge, and eventually became the invisible power brokers behind modern finance and corporate structures. But he
didn&#x27;t study them to emulate them. He studied them to understand how power concentrates, and how to design alternatives.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;distinguishing-the-tool-from-the-structure&quot;&gt;Distinguishing the Tool from the Structure&lt;&#x2F;h2&gt;
&lt;p&gt;Using AI effectively isn&#x27;t the same as endorsing the concentration of its development in a few corporations, or the
extractive data practices, or the environmental costs. I can be pragmatic about using the tools while being vocal about
the structural problems.&lt;&#x2F;p&gt;
&lt;p&gt;Fuller didn&#x27;t refuse to use electricity because power companies were monopolistic. He designed systems for more
distributed energy.&lt;&#x2F;p&gt;
&lt;p&gt;For me this means learning to work with AI while pushing for open-source alternatives, better regulation, and
environmental accountability. Being the person in the room who can say &quot;this is impressive technically AND here&#x27;s why
the current trajectory is dangerous.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Deep expertise gives me standing that pure critics don&#x27;t have.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sharing-knowledge-not-hoarding-it&quot;&gt;Sharing Knowledge, Not Hoarding It&lt;&#x2F;h2&gt;
&lt;p&gt;Fuller&#x27;s response to the pirates&#x27; legacy was essentially: what if we made all knowledge accessible? What if we designed
for everyone&#x27;s success, not competitive advantage? What if we operated from abundance rather than scarcity?&lt;&#x2F;p&gt;
&lt;p&gt;My expertise becomes more valuable when I give it away, not less. I&#x27;m trying to document what I&#x27;m learning about AI
publicly. The &quot;competitive moat&quot; thinking is pirate logic. Fuller would say security comes from being genuinely useful
to the whole system.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;new-tools-old-discipline&quot;&gt;New Tools, Old Discipline&lt;&#x2F;h2&gt;
&lt;p&gt;The best practices for working with AI aren&#x27;t new.&lt;&#x2F;p&gt;
&lt;p&gt;Write clear specs before you start coding. Break work into well-defined tasks. Review output carefully. Give good
context. Think about architecture before implementation. We were supposed to be doing this for decades.&lt;&#x2F;p&gt;
&lt;p&gt;My friend who kept banging on about spec-driven development? He was right. Writing a proper spec before handing work to
an AI agent produces dramatically better results than prompting and hoping. The spec forces you to think first. The
thinking was always the valuable bit.&lt;&#x2F;p&gt;
&lt;p&gt;Anthropic published a piece
about &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;claude.com&#x2F;blog&#x2F;how-anthropic-teams-use-claude-code&quot;&gt;how their own teams use Claude Code&lt;&#x2F;a&gt;. They treat AI
agents like a development team. Give them proper context. Plan before executing. Maintain human oversight. Review before
deploying. It reads less like a technology manual and more like a management handbook. Because that&#x27;s what it is.&lt;&#x2F;p&gt;
&lt;p&gt;The people thriving with these tools aren&#x27;t the ones who learned prompt engineering from scratch. They&#x27;re the ones who
already valued clear thinking, systematic review, and well-structured work. The craft didn&#x27;t die. It shifted from typing
to thinking. From writing code to specifying intent, reviewing output, and knowing when something&#x27;s wrong.&lt;&#x2F;p&gt;
&lt;p&gt;That&#x27;s why accumulated expertise still matters, even as the tools change underneath you. You need to know what good
looks like before you can judge whether an AI produced it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-economic-argument&quot;&gt;The Economic Argument&lt;&#x2F;h2&gt;
&lt;p&gt;The economics of AI are fundamentally broken.&lt;&#x2F;p&gt;
&lt;p&gt;Billions invested in training runs. Models obsolete in months. Massive duplication of effort across competing companies.
Each company is rebuilding similar capabilities from scratch. Energy and compute wasted on redundant training. Race
dynamics forcing premature releases and corner-cutting.&lt;&#x2F;p&gt;
&lt;p&gt;Fuller would see this and say: this is competition-based scarcity thinking producing artificial scarcity while
simultaneously creating massive waste. It&#x27;s exactly backwards. He believed humanity&#x27;s problems weren&#x27;t resource
problems, they were design and coordination problems. We have enough for everyone if we design efficiently and
collaborate.&lt;&#x2F;p&gt;
&lt;p&gt;What if the massive investment was collaborative rather than competitive? Shared base models, openly developed.
Companies compete on applications and implementations, not on rebuilding foundation models. Like how we don&#x27;t have
competing internets, we have shared infrastructure with competition at other layers.&lt;&#x2F;p&gt;
&lt;p&gt;What if we designed for longevity rather than obsolescence? Smaller, more efficient models that actually get refined
over time. Focus on getting more capability from less compute. Sustainable rather than race-to-the-bottom dynamics.&lt;&#x2F;p&gt;
&lt;p&gt;The current model only &quot;works&quot; because venture capital and tech giants can sustain losses hoping for future monopoly.
The race dynamic forces everyone to participate or be left behind. It&#x27;s a prisoner&#x27;s dilemma. Everyone would be better
off cooperating, but no one can unilaterally stop competing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;being-a-trim-tab&quot;&gt;Being a Trim Tab&lt;&#x2F;h2&gt;
&lt;p&gt;Fuller&#x27;s favourite metaphor was the trim tab. The small rudder that turns the big rudder that turns the ship. You don&#x27;t
have to move the whole ship yourself. You find the leverage point where a small action creates a larger change.&lt;&#x2F;p&gt;
&lt;p&gt;I can&#x27;t change that major AI models are controlled by a few companies, or the massive energy consumption, or the global
race dynamics. But I can change what problems I work on, how I share knowledge, what tools and alternatives I support,
and what voice I lend to which conversations.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-means-in-practice&quot;&gt;What This Means in Practice&lt;&#x2F;h2&gt;
&lt;p&gt;For me, it means focusing on problems that actually help people. Not extraction and manipulation. Is this work helping
people do more with less? Is it reducing drudgery? Creating genuine value?&lt;&#x2F;p&gt;
&lt;p&gt;The scarcity mindset says, &quot;AI is taking my job, I need to protect my turf.&quot; I&#x27;m trying to think differently. AI can
handle routine work, freeing me up for problems I haven&#x27;t had capacity to address.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m deploying AI across my workflow now, orchestrating multiple agents, refining a process around human oversight at the
points with most leverage. The bottleneck has shifted from writing code to reviewing it. Scaling the human judgement
side is the interesting problem.&lt;&#x2F;p&gt;
&lt;p&gt;My expertise isn&#x27;t a scarce resource to protect. It&#x27;s a foundation to build something better on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-uncomfortable-reality&quot;&gt;The Uncomfortable Reality&lt;&#x2F;h2&gt;
&lt;p&gt;I don&#x27;t have this fully resolved. The tension is real. The risks are real. But sitting it out isn&#x27;t an option either.&lt;&#x2F;p&gt;
&lt;p&gt;The test isn&#x27;t whether AI is good or bad. It&#x27;s whether we can shape how it develops and who it benefits. That needs
people who understand both the technology and its dangers to actually be in the room.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m still concerned. But I&#x27;m building with eyes open and values intact. If you&#x27;re sitting with the same contradiction,
I&#x27;d genuinely love to hear how you&#x27;re thinking about it.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Do More With Less: My Web Stack for 2026</title>
        <published>2026-01-03T00:00:00+00:00</published>
        <updated>2026-01-03T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/my-2026-stack/"/>
        <id>https://daz.is/blog/my-2026-stack/</id>
        
        <content type="html" xml:base="https://daz.is/blog/my-2026-stack/">&lt;p&gt;Over the past few months, in my spare time, I&#x27;ve been working on my
side-project, &lt;a href=&quot;&#x2F;work&#x2F;zero-waste-tickets&#x2F;&quot;&gt;Zero Waste Tickets&lt;&#x2F;a&gt;, where I make heavy use of HTMX, server-rendered HTML, and
a few bits of vanilla JavaScript for interactions. I&#x27;m able to do more with less. Much more than you might expect for a
single dev working in my spare time. And it&#x27;s fast to load, and I can reason about the whole codebase in one place.&lt;&#x2F;p&gt;
&lt;p&gt;I keep reaching for simpler tools and getting better results. The defaults in web development have drifted towards
complexity that most projects don&#x27;t need. React had its moment, but it got used everywhere, including lots of places it
shouldn&#x27;t. And it created a whole ecosystem of build tooling, state-management, and not-quite-right abstractions
patching other not-quite-right abstractions.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-philosophy-do-more-with-less&quot;&gt;The Philosophy: Do More With Less&lt;&#x2F;h2&gt;
&lt;p&gt;The best stack is the one where you&#x27;ve removed everything that isn&#x27;t pulling its weight.&lt;&#x2F;p&gt;
&lt;p&gt;Splitting an application into a heavy frontend and a separate backend roughly doubles the surface area. Two codebases,
two deployment pipelines, two sets of state to manage, and a contract between them that needs constant maintenance.
Business logic gets scattered. Some in the frontend for &quot;responsiveness,&quot; some in the backend for &quot;security.&quot; Testing
becomes harder. Onboarding new developers takes longer. Why do we keep doing this to ourselves?&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;Simplicity is a great virtue, but it requires hard work to achieve it and education to appreciate it. And to make
matters worse: complexity sells better.&quot; — Edsger Dijkstra&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;That last line is the whole story of React&#x27;s dominance. Complexity sold very well indeed.&lt;&#x2F;p&gt;
&lt;p&gt;The alternative is to co-locate business logic in the backend and treat HTML as a first-class output format. This isn&#x27;t
a new idea. It&#x27;s how the web worked for most of its early history. But it deserves a serious look now that HTML and CSS
have got genuinely good.&lt;&#x2F;p&gt;
&lt;p&gt;Modern HTML gives you dialogs, form validation, lazy loading, and semantic elements that screen readers understand out
of the box. CSS handles layouts that used to require JavaScript. If you lean into the platform, you get performance and
accessibility by default instead of fighting for them.&lt;&#x2F;p&gt;
&lt;p&gt;That doesn&#x27;t mean zero JavaScript. It means JavaScript where it actually adds value, not as the default for everything.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-rust-backend-axum-sqlx-and-postgres&quot;&gt;The Rust Backend: Axum, SQLx, and Postgres&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m writing web services in Rust using Axum. The type system catches entire categories of bugs at compile time, and the
performance characteristics mean I don&#x27;t need to think about scaling until much later than I would with other languages.&lt;&#x2F;p&gt;
&lt;p&gt;Axum itself is straightforward. It&#x27;s a thin layer over the Tokio async runtime with good ergonomics for routing and
middleware. Unlike the full-featured frameworks you get in other languages, Axum is deliberately a low-level building
block. I appreciate that as a design goal, even if it means you&#x27;re assembling more pieces yourself. I&#x27;ve ended up
writing my own macro-based utilities on top of it to make common patterns more declarative, which is exactly the kind of
flexibility this approach allows.&lt;&#x2F;p&gt;
&lt;p&gt;For database access, I&#x27;m using SQLx with Postgres. SQLx checks your SQL queries against your actual database schema at
compile time. No ORM abstraction layer, no runtime query building, just raw SQL with the safety of knowing it will work
before you deploy.&lt;&#x2F;p&gt;
&lt;p&gt;Postgres continues to be the right default for most applications. It handles JSON, full-text search, and geospatial data
without needing separate systems. That said, it&#x27;s not without its quirks. If you try to build queue systems on top of
it, or rely heavily on locks and NOTIFY, you can run into global lock contention that&#x27;s easy to miss until it becomes a
problem. It&#x27;s quite possible to shoot yourself in the foot.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve been using Postgres long enough that I know where the rough edges are and how to work around them. Over the years
I&#x27;ve watched people move to trendier databases only to spend time rebuilding features that a relational database gives
you out of the box.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;html-first-with-maud-and-htmx&quot;&gt;HTML-First with Maud and HTMX&lt;&#x2F;h2&gt;
&lt;p&gt;For rendering HTML, I&#x27;m using Maud, a macro-based templating library for Rust. Templates are type-checked at compile
time, and because they&#x27;re just Rust code, you get all the refactoring and IDE support you&#x27;re used to.&lt;&#x2F;p&gt;
&lt;p&gt;For styling, I&#x27;m using a small macro to collect and aggregate CSS snippets from across the codebase.&lt;&#x2F;p&gt;
&lt;p&gt;When I need interactivity beyond what HTML provides natively, I reach for HTMX. It lets you make AJAX requests and
update parts of the page using HTML attributes. The mental model stays simple: the server returns HTML, and HTMX swaps
it into the DOM.&lt;&#x2F;p&gt;
&lt;p&gt;The result is fast and accessible. The pages are small. The server does the work it&#x27;s good at. And because the core
content is server-rendered HTML, there&#x27;s a graceful baseline even without JavaScript.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;infrastructure-you-can-understand-docker-compose-on-a-vps&quot;&gt;Infrastructure You Can Understand: Docker Compose on a VPS&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m currently in the process of moving my side projects from cloud services to a self-hosted VPS. The primary motivation
is cost. Running something like Zero Waste Tickets on managed cloud infrastructure costs more than it needs to. But
there&#x27;s also value in having infrastructure I can fully understand and control.&lt;&#x2F;p&gt;
&lt;p&gt;For orchestration, Docker Compose is enough. I define my services in a single file, run &lt;code&gt;docker compose up&lt;&#x2F;code&gt;, and
everything works. No service mesh, no ingress controllers, no YAML spread across dozens of files.&lt;&#x2F;p&gt;
&lt;p&gt;For object storage, I use whatever S3-compatible option the host provides, or RustFS if I need to run it myself. The S3
API is a reasonable standard, and there&#x27;s no reason to couple to a specific cloud provider for blob storage.&lt;&#x2F;p&gt;
&lt;p&gt;We use Kubernetes heavily at work, and I understand why it exists. For large-scale systems with dedicated platform
teams, the complexity is justified. But for most projects, it&#x27;s overhead that doesn&#x27;t pay for itself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;durable-execution-for-the-agentic-future-restate&quot;&gt;Durable Execution for the Agentic Future: Restate&lt;&#x2F;h2&gt;
&lt;p&gt;The piece of this stack I&#x27;m most excited about is Restate.&lt;&#x2F;p&gt;
&lt;p&gt;Restate is a durable execution platform. It stores the progress of your workflows as they run, so if something crashes,
execution resumes from where it left off rather than restarting from the beginning. Retries, idempotency, and state
persistence are handled by the runtime rather than being your problem.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve been using it in side projects for long-running workflows and agent orchestration. If you&#x27;ve ever built a multistep
process, an agent that calls external APIs, waits for responses, and makes decisions over minutes or hours. You end up
hand-rolling state machines, writing retry logic, building idempotency checks, tracking progress in a database. And then
you do it all again for the next workflow, slightly differently each time.&lt;&#x2F;p&gt;
&lt;p&gt;Restate makes that entire category of problem go away. You write your workflow as straightforward code, and the runtime
handles persistence, retries, and recovery. If something crashes, execution picks up where it left off.&lt;&#x2F;p&gt;
&lt;p&gt;As AI agents start doing more real work, spanning multiple steps, calling external services, running over extended time
periods, this kind of infrastructure stops being nice-to-have. I&#x27;m planning to expand my use of it and introduce it at
work where it makes sense.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;&#x2F;h2&gt;
&lt;p&gt;None of these are particularly exotic choices. Postgres has been around for decades. HTML-first development is older
than React. Docker Compose is deliberately simple. What&#x27;s changed is that I&#x27;m choosing them deliberately rather than
defaulting to complexity. Doing more with less.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;d love to know what your stack looks like right now. What are you excited about this year? What have you dropped that
you used to think was essential? Drop me a line. I&#x27;m always up for that conversation.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>New Year, New Blog</title>
        <published>2026-01-01T00:00:00+00:00</published>
        <updated>2026-01-01T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://daz.is/blog/new-year-new-blog/"/>
        <id>https://daz.is/blog/new-year-new-blog/</id>
        
        <content type="html" xml:base="https://daz.is/blog/new-year-new-blog/">&lt;p&gt;I&#x27;m bringing the blog back.&lt;&#x2F;p&gt;
&lt;p&gt;The world I&#x27;ve spent my career in is changing fast, and I&#x27;ve been doing a lot of thinking about where things are
heading. I want a place to work through those ideas properly, not in throwaway social media posts but in something I own
and can build on.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ve also been digging through my archives, recovering old posts from previous incarnations of this site. I&#x27;ll be
backfilling those over time. For nostalgia, not much of that older thinking still holds up, but it&#x27;s interesting to see
how things have changed.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s music and other projects I want to get up here too.&lt;&#x2F;p&gt;
&lt;p&gt;For now, stick around. I&#x27;ve got a lot to share about this strange moment we&#x27;re all in.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
