nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-08 00:39:50 +00:00

History

gio 889e588883 add LEADERBOARD_SUBMISSION.md (Run 7 candidate) d22 + 6000 iter + bs=1M + warmdown=0.85 + muonclip τ=100 - CORE 0.2646 in 88.2 min (matches Run 6 quality, 10.9% faster wall-clock) - val_bpb 0.7241 Both warmdown=0.85 and muonclip individually regress at d22; together they synergize. MuonClip is the only code addition — 66 LOC across optim.py + gpt.py + base_train.py, default OFF preserves Run 6 behavior bit-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-26 21:36:00 -05:00
..
estimate_gpt3_core.ipynb	add notebook on deriving the CORE estimates for the GPT-3 miniseries.	2026-01-05 18:40:28 +00:00
gen_synthetic_data.py	tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3	2026-02-02 01:45:59 +00:00
generate_logo.html	initial commit	2025-10-13 06:49:24 -07:00
LEADERBOARD_SUBMISSION.md	add LEADERBOARD_SUBMISSION.md (Run 7 candidate)	2026-04-26 21:36:00 -05:00
LEADERBOARD.md	~1.5h :-)	2026-03-15 22:29:27 +01:00
LOG.md	bunch of ideas tried from openai/parameter-golf, all negative results for nanochat	2026-03-24 22:13:13 +00:00
nanochat.png	Update logo	2025-10-14 14:19:44 -04:00
repackage_data_reference.py	document the legacy fineweb100b dataset and the new climbmix400b dataset	2026-03-03 17:24:31 +00:00
scaling_analysis.ipynb	fix scaling laws scripts after the bigram embeddings were removed	2026-03-17 16:55:56 +00:00
scaling_laws_jan26.png	nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1	2026-01-31 19:12:25 +00:00