mirror of
https://github.com/karpathy/nanochat.git
synced 2026-04-05 15:15:48 +00:00
Revert d26 batch size from 1M to 0.5M and lower param-data ratio from 8.25 to 7.25. In the speedrun's undertraining regime, smaller batch with more optimization steps (12,700 vs 7,226) is more efficient than larger batch with fewer steps. Result: CORE 0.2626, time 8967s (2.49h), val_bpb 0.750008 Reproduced: CORE 0.2729/0.2626 across two runs, both pass. AI disclosure: experimental design and hyperparameter search were conducted using Claude Code. |
||
|---|---|---|
| .. | ||
| estimate_gpt3_core.ipynb | ||
| gen_synthetic_data.py | ||
| generate_logo.html | ||
| LEADERBOARD.md | ||
| LOG.md | ||
| nanochat.png | ||
| repackage_data_reference.py | ||
| scaling_analysis.ipynb | ||
| scaling_laws_jan26.png | ||