nanochat/runs
Jin Xu 00932d1955 Run 4: d26 0.5M batch, ratio 7.25 — 2.49h (9.6% faster)
Revert d26 batch size from 1M to 0.5M and lower param-data ratio from
8.25 to 7.25. In the speedrun's undertraining regime, smaller batch with
more optimization steps (12,700 vs 7,226) is more efficient than larger
batch with fewer steps.

Result: CORE 0.2626, time 8967s (2.49h), val_bpb 0.750008
Reproduced: CORE 0.2729/0.2626 across two runs, both pass.

AI disclosure: experimental design and hyperparameter search were
conducted using Claude Code.
2026-02-08 23:05:13 +00:00
..
miniseries.sh at 28 and above we start to need batch size 8 2026-02-08 18:26:34 +00:00
runcpu.sh merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both 2026-02-01 02:36:43 +00:00
scaling_laws.sh add engram-lite, add log, tune scaling laws analysis scripts 2026-01-27 22:31:17 +00:00
speedrun.sh Run 4: d26 0.5M batch, ratio 7.25 — 2.49h (9.6% faster) 2026-02-08 23:05:13 +00:00