How I Built a Software Factory That Runs While I Sleep

Someone on LinkedIn asked me: "How does your agent actually work? When are you going to explain the architecture?"

Alright. Here it is.

One agent doing everything doesn't work

For months I used a single AI session for everything. "Build me this feature." The code compiled. Tests existed. But the requirements were vague, the design was improvised, and the tests only covered the happy path.

Three months later something would break and I'd have no idea why the original code was written that way. No requirements doc. No design decisions recorded. Just code and a prayer.

So I split the work

Real software teams don't work like that either. You don't ask one person to write requirements, design the system, code it, and test it. You have people who focus on one thing and do it well.

I did the same with AI agents. Five agents, five phases, one pipeline:

REQ — reads the task, writes a requirements doc. Acceptance criteria, edge cases, scope. No code, just words. This forces me (and the agents downstream) to know what "done" looks like before writing a single line.

ANA — takes the requirements and checks feasibility. What code gets affected? What could break? What dependencies are involved? This is where you catch "sounds simple, will actually break three other things."

DES — produces the technical design. Data models, API contracts, component layout. The blueprint that IMP will follow.

IMP — writes code. But unlike a general agent, this one already has requirements, analysis, and a design to work from. It doesn't explore — it executes.

TEST — writes and runs tests against the acceptance criteria from REQ. Not "does the code work?" but "does it do what was promised?"

Each agent produces a document or code, attaches it to the task, and hands off to the next phase. If something fails, the task gets blocked and I get a notification.

Why this works better

A general-purpose agent holding requirements, design, and implementation in its head at the same time will cut corners. It'll skip the design because it's eager to code. It'll write tests that validate what it built rather than what was asked for.

Specialized agents can't skip ahead. The REQ agent doesn't know how to write code. The TEST agent doesn't care how the code works — it only checks whether the acceptance criteria pass.

The side effect I didn't expect: a paper trail. When something breaks months later, I can trace back through REQ → ANA → DES → IMP → TEST and see exactly what was decided and why.

How it's wired

When I create a task in Odoo (my project tracker), this happens:

An Odoo module detects the state change and publishes an event to Redis
A Rust listener picks it up
A Python dispatcher figures out which phase the task is in and spawns the right agent
The agent works, produces its artifact, updates the task, and moves it to the next phase
Repeat until all five phases are done

A few decisions that matter:

Event-driven. The factory reacts when something changes. No polling loops checking "is there work?" every 30 seconds.

Isolated agents. Each one gets a clean context: the task description plus artifacts from previous phases. Nothing else. When I tried sharing full context across phases, agents got confused by information that wasn't relevant to their job.

The orchestrator doesn't write code. It only coordinates. Mixing orchestration with execution is how you end up with a mess that's impossible to debug.

Everything gets logged. Every agent posts its progress to the task's activity feed. I can see what happened, when, and what each agent produced.

What 12 minutes looks like

I needed a Markdown-to-PDF script. I wrote one line in Odoo: "Script to convert Markdown files to PDF with proper formatting."

REQ (2 min): requirements doc — input formats, output styling, error handling, CLI interface
ANA (2.5 min): evaluated WeasyPrint vs pdfkit, recommended approach, flagged risks
DES (3 min): module structure, CLI arguments, CSS template system
IMP (2 min): wrote the code, one clean file, following the design spec
TEST (2 min): 9 tests, all passed

12 minutes. One sentence of input from me. The output wasn't just working code — it was code with a requirements doc, analysis, design, and test suite. All attached to the task.

Things I got wrong first

I polished agents before the pipeline worked. Wasted weeks perfecting prompts for individual agents when I didn't have a reliable way to chain them. The orchestration layer matters more than any single agent's quality.

I let agents discover things dynamically. Agent IDs, project IDs, stage mappings — I used to tell agents "look up the project owner's ID." They'd get it wrong in creative ways. Now everything is hardcoded constants.

Instructions aren't enough. "Notify the project owner" fails in surprising ways. The exact API call with the exact parameters works every time. Agents are better at running code than interpreting vague instructions.

Shared context was a mistake. When agents from different phases could see each other's full context, they'd get confused. The IMP agent would second-guess the DES agent's decisions. Clean boundaries fixed it.

Adversarial review catches what self-review doesn't. After IMP, I run a separate review step — an AI reviewer that tries to find problems in the code. Three rounds. This catches things that a "looks good to me" self-check misses every time.

What it is (and isn't)

The SW Factory doesn't replace developers. I still review outputs. I still make architecture decisions. I still write the task descriptions.

But the parts that used to eat my day — writing requirement docs, doing analysis, producing boilerplate, running test suites — those happen automatically now. Five agents, each good at one thing, running in sequence while I do something else.

That's the whole trick. Not a smarter AI. A better system.