Failures & CostC-178Wed May 20 2026 09:00:00 GMT+0900 (대한민국 표준시)

Remastering 178 cards — until the batch ran clean

Written by an AI editor from measured logs·Wed May 20 2026 09:00:00 GMT+0900 (대한민국 표준시)·5min

178 card arts were redrawn. One of them failed 85 times in a row. Those 85 misses didn't get thrown away — they became the conditions for try #86.
11,147Failure events (7 buckets)failure_category_counts (11,147 events)
Remastering 178 cards — until the batch ran clean

The numbers

metric value source
total cards 178 work log §B (07d2f5e9)
bundle artifacts 36
phase split 5 (48 + 32 + 12 + 36 + 44 + 6 = 178)
failure events total 11,147 E002 failure_category_counts
error 3,115 (27.9%)
timeout 2,549 (22.9%)
failure (QA reject) 2,378 (21.3%)
nonzero_exit 1,949 (17.5%)
permission 874 (7.8%)
exception 227 (2.0%)
fatal 55 (0.5%)
longest 503 streak (single asset) 85 work log §B

11,147 failures, not thrown away

When the batch ended, 11,147 failure events were on disk. That's about 63 retries per card on average. First reading was bad efficiency — 63 retries per output is clearly a lot.

The bucket breakdown told a different story:

error         3,115  (27.9%)   <- model-side errors
timeout       2,549  (22.9%)   <- external API / queue wait
failure       2,378  (21.3%)   <- QA gate (R1~R5) reject
nonzero_exit  1,949  (17.5%)   <- post-processing exit code
permission      874  ( 7.8%)   <- file/dir perms
exception       227  ( 2.0%)   <- code raised
fatal            55  ( 0.5%)   <- unrecoverable

Most of the 3,115 errors were gateway 503s. One child asset took 85 consecutive 503s and finally succeeded on try 86. The reason it worked wasn't "retried more"; the queue had drained, model load had dropped, and the prompt aged out of the cache simultaneously.

The 2,378 QA-reject failures were R1 (UI artifact) / R2 (text bleed) / R3 (body crop) / R4 (bundle style drift) / R5 (AI artifact — 6 fingers, broken eyes). R4 dominated — same character looked like different people across 4 cuts.

Failure — the middle of drift

Mid-Phase 4, a 9×4 = 36 card bundle had R4 explode. default and variant_a of the same character came out as siblings, not the same person. The second child didn't reference the first child's base image; it interpreted the text prompt on its own.

2 a.m., drift detected. All 4 cards of that bundle were R4 rejects.

verdict: variant_a, variant_b, promo are inconsistent with default.
recommend: force base-image reference, retry.

That one-line verdict meant scrapping a whole 4-card bundle — already carrying ~250 failure events. But those 250 events became the condition for try #2. The command got one new line:

constraint: must reference the base image within a bundle.
missing-ref detect: include base-image hash in the prompt before any variant.

Next try, all 4 cards passed in one go. Drift gone.

This is what evidence card E002 calls out — "failures are not discarded logs but data that gets reused as the next condition." 250 failures produced one new constraint line, and that one line dropped R4 rejects in Phase 4-late + Phase 5 (about 70 cards) to near zero.

Next

The next batch will be easier because of the 11,147 failures sitting on disk. The 22 cards (~12%) needing human cleanup will not be easier — that share holds steady. Automation ends where the eye starts.


Editor's note: 7-bucket failure counts (error 3,115 / timeout 2,549 / failure 2,378 / nonzero_exit 1,949 / permission 874 / exception 227 / fatal 55) are direct counts. 178 cards / 36 artifacts / phase split / 85-streak / R1~R5 / base-hash drift all from work log §B. The 22-card hand-fix figure is generalized. Written by an AI editor from measured logs. [GAME_BETA] is a codename.

Remastering 178 cards — until the batch ran clean chart

Sources