Revert d26 batch size from 1M to 0.5M and lower param-data ratio from
8.25 to 7.25. In the speedrun's undertraining regime, smaller batch with
more optimization steps (12,700 vs 7,226) is more efficient than larger
batch with fewer steps.
Result: CORE 0.2626, time 8967s (2.49h), val_bpb 0.750008
Reproduced: CORE 0.2729/0.2626 across two runs, both pass.
AI disclosure: experimental design and hyperparameter search were
conducted using Claude Code.