Six Days Equals Six Weeks

In 2007, the first – and hardest – hurdle to shipping my first game was getting orcs to spawn on one side of a board and pathfind around obstacles to the other side. That took me six weeks.

Last month I started the project over from scratch — Unity 6.3 client, Rails 8.1 backend, almost none of the original code carried forward. I hit the pathfinding milestone on day six. And I had a hell of a lot more to show for it.

The difference: I’m not really writing the code anymore. I’m running a small software factory — Claude, Copilot, and Codex working in shifts. Most of my time goes to game design and business planning. Most of the “development” I do is designing the protocol they follow.

Here’s the catch: The code coming out the other end is sloppy and architecturally incoherent. The tests pass and the game plays, but if you stop there, you’ve built something you wouldn’t want to live with. The interesting part of this post isn’t the 7x speedup. It’s everything I’ve layered on top to make the output coherent.

Apparently I’m doing a series of blog posts about this project!
Part 2: Specialists in the Factory
Part 3: The Invisible Work Matters
Part 4: Progress, 37 Days in

How I got here

In 2007, I set out to start my own game studio. I had no real game-development experience prior, other than dabbling with making a casual Lumines clone.

My plan was to take the startup mentality and apply it to game development by producing a series of games in rapid succession, each building on the last and learning from what the market told me.

My first game had a very short list of requirements. It would…

Be a tower defense game.
Be 3D.
Have pathfinding.
Ship in 3 months.

I shipped exactly 3 months to the day after starting. I had 1.5 hours left. I also had to ship several point releases in the next 24 hours to fix serious regressions from that final push. But — I had done the thing I set out to do.

That game was When Orcs Attack and you’ve absolutely never heard of it. It sold 50 copies. That led to Hordes of Orcs, which led to Hordes of Orcs 2 and those two sold north of 10,000 copies.

The studio never reached break-even, and Unity LTS wasn’t a thing yet. Getting the Hordes of Orcs 2 project running in even Unity 3.0 was a non-starter. The games have been sitting on the shelf ever since. I never lost the itch though.

Eighteen days ago, I decided to take another crack at it, and to try out this whole “software factory” approach that people keep going on about. I resurrected the art assets from the old games. The only code I brought over was 150 lines of especially-clever UI code for inter-scene transitions and a couple of not-terribly-sophisticated shaders. Everything else is fresh.

The first six days

The timeline:

Raw dump of art assets from the old games. Putting together a test scene to make sure things looked coherent and performed on mobile.
Catching up on Unity. Making decisions about rendering. Tweaking import settings, materials, shaders.
This is where Claude comes in. Simple camera controls. Terrain rendering. Oh, and the whole grid system that forms the foundation of the game.
Players placing towers. Beginning the switch from IMGUI to UI Toolkit. Improving the aesthetics of my custom shaders.
Finishing the switch to UI Toolkit. Support for fixed-path levels.
Better terrain rendering. Pathfinding and navigation. Parity with my original 6-week milestone.

Calling it parity is generous to my 2007 self. By day six I had editor tools for debugging and tuning, performance dialed in for mobile, and the beginnings of the backend — plus a working technical and business plan I’d been hashing out with Claude. The pathfinding milestone wasn’t the finish line for the first sprint; it was a checkpoint somewhere in the middle.

Growing a process

The first couple of days were mostly:

Re-learning enough Unity to make intelligent decisions about which parts to use. Built-in rendering pipeline or URP? IMGUI, uGUI, or UI Toolkit?
Auditioning Asset Store elements for water, skyboxes, and the like.
Reimporting all the assets and deciding what I’d need.

None of that is particularly suited to AI right now. But once it came time to start building, that’s where the AI came in.

I started not knowing how useful AI coding tools would even be for something that wasn’t a web app. So my process began with close collaboration and supervision, and gradually became more parallel and more hands-off.

The current workflow

The interesting part isn’t any single tool. It’s the protocol between them. As of right now, this is where I’m at:

A few key rules in CLAUDE.md.
A SessionStart hook that catches the most common worktree-vs-main-repo mix-up.
Conventions for how I interact with Claude.
Conventions for how Issues get filed and groomed.
One small shell script that makes acceptance testing across worktrees less tedious.

It’s a layered system. Five layers right now:

Issues.
Feature/fix PRs.
Checkpoint/release PRs.
Code audits.
Documentation.

Each layer is driven by Claude, Copilot, and/or Codex.

Layer 1: Issues

Keeping track of what needs to get done is the most critical step. The moment I see something wrong, I file a GitHub Issue. If a task is too big, it gets broken into sub-Issues (by Claude). I have become relentless about this. Claude is also empowered to file new Issues, and does so of its own accord with some frequency.

You’ll notice I didn’t say anything about GitHub Projects. So far, one of two things is true at any given moment:

I have a specific area I want to push forward.
I want to shore up the weak parts of the game.

In case 1, I tell Claude “go find Issues about X and work on them,” or point it at a set of specific Issues. In case 2, I tell Claude to pick out Issues it can attack autonomously that are highest-impact for getting the game shipped.

After 18 days, the project has 183 closed Issues and 82 open. Periodically I have Claude groom the backlog. Finding Issues that have been addressed but not closed. Breaking down that need to be broken down further. Asking questions on ones that are too ambiguous to attack. Without that, Claude tap-dances around the big things and the backlog turns into noise.

Layer 2: The core loop

This is the heart of the process. The “core game loop” in the game of game development.

Claude Code is the orchestrator and the implementer. It handles triage, decomposition, dispatch, monitoring, most of the typing.
Copilot is the gate. It reviews every PR before I do, and there are explicit rules in CLAUDE.md for when Claude listens to it and when Claude doesn’t. Claude and Copilot iterate until Copilot is satisfied.
Codex is the backup. If I run out of Copilot quota, Codex does the reviews. If Claude can’t reach a working solution, Codex gets a crack at the problem.
I am the customer of the factory. I set priorities, do acceptance testing, merge PRs, and file more Issues as they come up.

This layer is gated mostly on token cost. I could have Claude tackle 10 or 15 Issues in parallel, but I’d blow through my budget instantly. To stay within Max 20x, I keep it to 3 sub-agents at once. I often let it run overnight, so I wake up to a stack of PRs.

What this layer does not produce: good code.
Run this loop unchecked and it will ship code that is architecturally incoherent and sloppy. Collisions in assumptions arise. Layers of myopia-induced complexity accrue, where an imagined requirement leads to a bone-headed workaround, which leads to the assumption of an implicit and needlessly complex interface, and so on. The tests pass. The game plays. But it accrues, one PR at a time. If you stop here, you’ll have a codebase you wouldn’t want to live with — and sooner or later getting anything done will get challenging.

That reality is the entire reason Layers 3, 4, and 5 exist.

Layer 3: Taking a step back

This is the first layer where coherence is actively managed.

I work on main but don’t ship from it. Periodically — anywhere from once every couple of days to a couple of times per day — I cut a checkpoint branch off main and open a PR to merge checkpoint into release. The PR is much larger than any single feature PR, and the review covers a much wider surface area.

Copilot reviews the merge PR. If more than 300 files have changed, or I’m over-budget on Copilot, Codex takes its place. (This is manual at present.)

This step finds two kinds of problems: things the per-PR reviewers missed, and — more importantly — things they couldn’t have found, because two PRs developed in parallel each looked fine in isolation but stepped on each other in some assumption that spans both. Depending on severity, I instruct Claude to file Issues, or to fix things on the spot.

Layer 4: Converging on coherence

Every few days, I have Codex or Claude perform a full audit of either the game client or the API server. The findings become a new Issue. From there, a separate Claude session is tasked with reviewing the findings and raising objections or questions.

At first the audits mostly surfaced a lot of nitpicks and localized issues. Now I task the auditor specifically with finding architectural inconsistencies, maintainability problems, and anti-patterns.

Today was largely spent on this layer. It kicked off an iterative back-and-forth between Claude and me about where the architecture is, where it should be, and how to get there. The output: a prioritized list of Issues, and a stack of PRs against them.

I’ll know this layer is working when it consistently comes back with only relatively small problems.

Layer 5: Avoiding drift

The remaining concern is that future tasks will drift from the conventions we’re establishing.

Last night I tasked Claude with documenting the architecture of the system. Not an aspirational document of how it should be — a real, messy, this-is-how-it-is set of guides, calling out contradictions and inconsistencies as honestly as the things that are uniform.

There’s a PR waiting for me with the results. I haven’t looked at it yet, on purpose: I want to compare it to the results of today’s audit and architectural-cleanup tasks. Then a fresh Claude will revise the docs based on today’s work. The diff between the original and the revision will tell me a lot about how successful Layer 4 was.

Once this is refined, it will likely become its own Claude sub-agent — as will the auditor role from Layer 4.

The artifacts

Three pieces of the system are worth showing in detail.

Acceptance criteria as a contract

Every PR description has an ## Acceptance Criteria block of GitHub-flavored checkboxes I tick off when validating. Each item has to be something I can independently verify — a behavior to observe, a screen to look at, a value to check. Not a restatement of the implementation.

## Acceptance Criteria

- [ ] Placing a tower on an occupied tile shows the "blocked" cursor and does not deduct currency.
- [ ] Selling a tower returns 75% of its cost rounded down.
- [ ] The wave counter HUD updates within one frame of wave start.

This single discipline does more for review velocity than anything else in the system.

The SessionStart hook

The single weirdest thing-that-just-happens with this much parallelism: a sub-agent cuts a worktree and then begins editing files in my worktree.

The repo has a SessionStart hook in .claude/settings.json that warns when pwd is the main repo while .claude/worktrees/ exists:

#!/usr/bin/env bash
# SessionStart guard: warn when pwd is the main repo while .claude/worktrees/ exists.
# Detection: linked worktrees have git-dir != git-common-dir; the main repo has them equal.

set -u

top=$(git rev-parse --show-toplevel 2>/dev/null || true)
gd=$(git rev-parse --git-dir 2>/dev/null || true)
cd_=$(git rev-parse --git-common-dir 2>/dev/null || true)

info=$(printf 'pwd: %s\ngit toplevel: %s' "$(pwd)" "${top:-<not in a git repo>}")

if [ -n "$top" ] \
  && [ "$(cd "$gd" 2>/dev/null && pwd)" = "$(cd "$cd_" 2>/dev/null && pwd)" ] \
  && [ -d "$top/.claude/worktrees" ]; then
  jq -n --arg m "WARNING: pwd is the main repo but .claude/worktrees/ exists. You may have intended to work in a worktree.

$info" '{systemMessage:$m}'
else
  printf '%s\n' "$info"
fi

The detection trick: linked worktrees have git-dir != git-common-dir; the main repo has them equal. The hook can’t catch every case — mid-session cwd drift, sub-agent absolute paths — but it catches the most common one, which is starting a session in the wrong place.

A small validation helper

Branches checked out by one worktree can’t be checked out in another. So acceptance-testing a PR means cutting a fresh branch on the head of that PR and checking that out. I wrote a validate-branch script in ~/bin to make this less tedious:

#!/bin/bash

BRANCH="$1"

git fetch && git branch -f validation "origin/$BRANCH" && git checkout validation

Small, dumb, lives at the boundary between worktree mechanics and acceptance testing — and I run it constantly.

What I’ll know in 30 days

Right now, all five layers are held together by me running them on judgment. Layer 1 grooming, Layer 3 cadence, Layer 4 audits, Layer 5 doc revisions — none of those are scheduled or automated yet. I do them when I sense the system needs them.

The real test is whether Layers 4 and 5 can keep pace with Layer 2’s output indefinitely, or whether Layer 2 outpaces them and the entropy wins. I don’t yet know the answer. I’ll come back in 30 days and tell you what broke.

The thing I’m most confident about, regardless of how that experiment turns out: the protocol matters more than the tools. The CLAUDE.md in the Hordes of Orcs 3 repo is mostly not about Claude. It’s about how a piece of work moves from “request” to “merged PR” with all three AI tools, and me, in their right lanes.

Oh, and if you’re curious, the game is coming along quite nicely!

Welcome to MrJoy

A few random thoughts, mostly about code.