TalentBoard
Tim Osterhus

Tim Osterhus

Founder, Agentic Systems Builder

9/10
Tim Osterhus

Tim Osterhus

Founder, Agentic Systems Builder

9/10
juniorremotebothMaui, HI

About

I build agent-native development tools for autonomous software delivery. My current project, Millrace, is a Python runtime with separate execution and research planes, durable state, and restart-safe control for long-running software workflows. I’m using it today to build and ship custom software for churches.

Skills

Agentic EngineeringAgent-Native InfrastructureAgentic Systems DesignSpec EngineeringSystem ArchitectureAutonomous Agentic OrchestrationHarness EngineeringContext EngineeringWorkflow EngineeringAgentic Memory Architecture

Proof of Work

1 project

Key Metrics

4% faster than Codex CLI's Fast ModeMatched final outcome parity with a stronger model~64% fewer total token usage than Codex CLI4 manual prompts vs 30+

Problem & Outcome

Problem

AI coding tools are strong at short bursts but weak at long-horizon, auditable, restart-safe software execution. As a solo builder, I wanted something I could spin up, walk away from, and come back to a few hours later with a fully implemented spec. That meant I needed a runtime that could persist state, separate research from execution, and keep working through planning, implementation, QA, and remediation instead of collapsing into one-shot code generation.

Outcome

Millrace became a local, file-backed Python runtime with separate execution and research planes, durable state, restart-safe control, and configurable loop architectures for agent-driven software work. I’m already using it to build and ship custom church software. On a public one-shot benchmark, Millrace reached 96/100 on a substantial Minecraft mod port in 4 manual prompts, about 17.4 hours, and roughly 362M tokens, versus raw Codex CLI at 95/100, 30+ prompts, about 18 hours, and 1B+ tokens. Further, Millrace used GPT-5.3-Codex High/Extra High under the hood; Codex CLI was using GPT-5.4 Extra High and Fast Mode enabled.

Architecture

Models

gpt-5.3-codexgpt-5.4

Tools

PythonCodex CLIPydanticPytestTyperTomlKitWatchdog

Millrace was designed to be primarily agent-usable, not human-operable. It separates research and execution into governed planes, persists task state to disk for restart-safe recovery, and supports configurable loop entrypoints so different roles can plan, build, QA, hotfix, and escalate deterministically. The design priority is long-horizon completion, auditability, and controlled iteration rather than raw demo speed. A unique characteristic is the fact that only one agent is ever running at a time. This sequential orchestration choice is foundational: it removes any possibility of destructive interference between agents, makes autonomy far more reliable without extra guardrails, and eliminates a tremendous amount of operational complexity and governance requirements. Output quality and token efficiency is prioritized over maximum speed and parallelization.

What I Did vs AI

TaskMeAI / Other
Vision / Architecture90%10%
Spec Drafting / Decomposition20%80%
Implementation<1%>99%
QA / troubleshooting5%95%
Feature Direction90%10%

Links

Millrace vs Codex— GitHub Documentation of the Benchmark TestMillrace AI— Official PyPI PageMillrace Repo— GitHub RepoHomepage— Personal Domain

AI Fluency

Millrace AI

Millrace is an open source, lightweight Python runtime that can be installed via "pip install millrace-ai" on Python 3.11+ and runs on both Mac and WSL (native Windows operation has not been tested). Designed to allow agents to set up long-running autonomous tasks inside specific repos using governed research and execution, and doesn't stop running until the task has been fully completed. Supports custom "loop configurations" to enable more specialized autonomous work without altering the actual runtime code. Highly transparent control surface, and its persistent file-driven stateful memory allows Millrace to resume right where it left off in case of premature interruptions.

Millrace vs. Raw Codex Benchmark

Documented head-to-head benchmark porting a 10-year-old Minecraft mod from 1.8.9 to 1.21.11. Millrace finished at 96/100 after 4 manual prompts and ~360M tokens, versus 30+ manual prompts and 1B+ tokens for Codex. Millrace used gpt-5.3-codex high and xhigh exclusively, and ran for a total of ~17.4 hours; Codex CLI used the stronger gpt-5.4 xhigh with Fast Mode enabled, and ran for a total of ~18 hours.

The Journey OS

Using Codex + Millrace as a solo founder to build, test, and ship custom church software that is already functional and in use today.

Project Links

Millrace AIMillrace vs Codex Benchmark

Get in Touch

Interested in working with Tim Osterhus? Send a brief introduction and they'll get back to you if it's a fit.

The Journey OS