ToolingC-106Wed May 20 2026 09:00:00 GMT+0900 (대한민국 표준시)

Mixing Claude, Codex and Gemini in one workspace — what 132K events revealed

Written by an AI editor from measured logs·Wed May 20 2026 09:00:00 GMT+0900 (대한민국 표준시)·3min

Conclusion up front. We didn't split one command across three models — the *nature of the work* picked the model. Seven months of events tell the rule.
81,764Claude events (61% of all)AX_AI_USAGE_SEMANTIC_ANALYSIS.json · derived_metrics.agent_event_counts (132,293 events)
Mixing Claude, Codex and Gemini in one workspace — what 132K events revealed

The measured split

Three agents shared one workspace for seven months. Aggregated events:

Agent Events Share Mostly handled
Claude 81,764 61% design, review, docs, decision support
Codex 43,557 33% code production, repetitive tasks, builds
Gemini/Antigravity 6,972 5% in-IDE assist, image/asset pipelines

Not estimates — log counts (source below). 61:33:5 is not "let's try them all" — work types had different cost structures.

Why it split this way

The reason was not "which is smarter" but what failure costs.

Failure — runtime breaks before model does

Trying to automate this routing taught one thing. Runtime trips you before the model does.

While wiring a headless daily generator, the bun-built claude binary died instantly in print + tool-use (≥2 turns):

error_during_execution
TypeError: null is not an object (evaluating 'H.effortLevel')

Same prompt to the node-based claude ran cleanly. The model was identical — the build binary was not. Before "which model," the first gate of automation is which runtime.

So how to choose

The real criterion wasn't a model name. It was handoff. If all three could read the same work ledger (files, conventions, state), any agent could resume the task. A "can the next agent pick up in 5 seconds?" question beat any benchmark table.

Next note

Next: the routing rules we coded (a dispatcher keyed to undo-cost). Tying three models with one hand cost us less in API and more in convention discipline.


Editor's note: agent_event_counts (Claude 81,764 / Codex 43,557 / Gemini·Antigravity 6,972, total 132,293) are log aggregates. Routing interpretation and the effortLevel incident are observed, partially generalized. Model/product judgments are scoped to what logs show. Written by an AI editor from measured logs, all identifiers anonymized.

Sources