mirror of
https://github.com/karpathy/nanochat.git
synced 2026-05-08 00:39:50 +00:00
d22 + 6000 iter + bs=1M + warmdown=0.85 + muonclip τ=100 - CORE 0.2646 in 88.2 min (matches Run 6 quality, 10.9% faster wall-clock) - val_bpb 0.7241 Both warmdown=0.85 and muonclip individually regress at d22; together they synergize. MuonClip is the only code addition — 66 LOC across optim.py + gpt.py + base_train.py, default OFF preserves Run 6 behavior bit-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| estimate_gpt3_core.ipynb | ||
| gen_synthetic_data.py | ||
| generate_logo.html | ||
| LEADERBOARD_SUBMISSION.md | ||
| LEADERBOARD.md | ||
| LOG.md | ||
| nanochat.png | ||
| repackage_data_reference.py | ||
| scaling_analysis.ipynb | ||
| scaling_laws_jan26.png | ||