The numbers
| metric | value | source |
|---|---|---|
| total cards | 178 | work log §B (07d2f5e9) |
| bundle artifacts | 36 | 〃 |
| phase split | 5 (48 + 32 + 12 + 36 + 44 + 6 = 178) | 〃 |
| failure events total | 11,147 | E002 failure_category_counts |
| error | 3,115 (27.9%) | 〃 |
| timeout | 2,549 (22.9%) | 〃 |
| failure (QA reject) | 2,378 (21.3%) | 〃 |
| nonzero_exit | 1,949 (17.5%) | 〃 |
| permission | 874 (7.8%) | 〃 |
| exception | 227 (2.0%) | 〃 |
| fatal | 55 (0.5%) | 〃 |
| longest 503 streak (single asset) | 85 | work log §B |
11,147 failures, not thrown away
When the batch ended, 11,147 failure events were on disk. That's about 63 retries per card on average. First reading was bad efficiency — 63 retries per output is clearly a lot.
The bucket breakdown told a different story:
error 3,115 (27.9%) <- model-side errors
timeout 2,549 (22.9%) <- external API / queue wait
failure 2,378 (21.3%) <- QA gate (R1~R5) reject
nonzero_exit 1,949 (17.5%) <- post-processing exit code
permission 874 ( 7.8%) <- file/dir perms
exception 227 ( 2.0%) <- code raised
fatal 55 ( 0.5%) <- unrecoverable
Most of the 3,115 errors were gateway 503s. One child asset took 85 consecutive 503s and finally succeeded on try 86. The reason it worked wasn't "retried more"; the queue had drained, model load had dropped, and the prompt aged out of the cache simultaneously.
The 2,378 QA-reject failures were R1 (UI artifact) / R2 (text bleed) / R3 (body crop) / R4 (bundle style drift) / R5 (AI artifact — 6 fingers, broken eyes). R4 dominated — same character looked like different people across 4 cuts.
Failure — the middle of drift
Mid-Phase 4, a 9×4 = 36 card bundle had R4 explode. default and variant_a of the same character came out as siblings, not the same person. The second child didn't reference the first child's base image; it interpreted the text prompt on its own.
2 a.m., drift detected. All 4 cards of that bundle were R4 rejects.
verdict: variant_a, variant_b, promo are inconsistent with default.
recommend: force base-image reference, retry.
That one-line verdict meant scrapping a whole 4-card bundle — already carrying ~250 failure events. But those 250 events became the condition for try #2. The command got one new line:
constraint: must reference the base image within a bundle.
missing-ref detect: include base-image hash in the prompt before any variant.
Next try, all 4 cards passed in one go. Drift gone.
This is what evidence card E002 calls out — "failures are not discarded logs but data that gets reused as the next condition." 250 failures produced one new constraint line, and that one line dropped R4 rejects in Phase 4-late + Phase 5 (about 70 cards) to near zero.
Next
The next batch will be easier because of the 11,147 failures sitting on disk. The 22 cards (~12%) needing human cleanup will not be easier — that share holds steady. Automation ends where the eye starts.
Editor's note: 7-bucket failure counts (error 3,115 / timeout 2,549 / failure 2,378 / nonzero_exit 1,949 / permission 874 / exception 227 / fatal 55) are direct counts. 178 cards / 36 artifacts / phase split / 85-streak / R1~R5 / base-hash drift all from work log §B. The 22-card hand-fix figure is generalized. Written by an AI editor from measured logs. [GAME_BETA] is a codename.