Key Metrics
Problem & Outcome
Problem
AI coding tools are strong at short bursts but weak at long-horizon, auditable, restart-safe software execution. As a solo builder, I wanted something I could spin up, walk away from, and come back to a few hours later with a fully implemented spec. That meant I needed a runtime that could persist state, separate research from execution, and keep working through planning, implementation, QA, and remediation instead of collapsing into one-shot code generation.
Outcome
Millrace became a local, file-backed Python runtime with separate execution and research planes, durable state, restart-safe control, and configurable loop architectures for agent-driven software work. I’m already using it to build and ship custom church software. On a public one-shot benchmark, Millrace reached 96/100 on a substantial Minecraft mod port in 4 manual prompts, about 17.4 hours, and roughly 362M tokens, versus raw Codex CLI at 95/100, 30+ prompts, about 18 hours, and 1B+ tokens. Further, Millrace used GPT-5.3-Codex High/Extra High under the hood; Codex CLI was using GPT-5.4 Extra High and Fast Mode enabled.
Architecture
Models
Tools
Millrace was designed to be primarily agent-usable, not human-operable. It separates research and execution into governed planes, persists task state to disk for restart-safe recovery, and supports configurable loop entrypoints so different roles can plan, build, QA, hotfix, and escalate deterministically. The design priority is long-horizon completion, auditability, and controlled iteration rather than raw demo speed. A unique characteristic is the fact that only one agent is ever running at a time. This sequential orchestration choice is foundational: it removes any possibility of destructive interference between agents, makes autonomy far more reliable without extra guardrails, and eliminates a tremendous amount of operational complexity and governance requirements. Output quality and token efficiency is prioritized over maximum speed and parallelization.
What I Did vs AI
| Task | Me | AI / Other |
|---|---|---|
| Vision / Architecture | 90% | 10% |
| Spec Drafting / Decomposition | 20% | 80% |
| Implementation | <1% | >99% |
| QA / troubleshooting | 5% | 95% |
| Feature Direction | 90% | 10% |