nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-04-05 15:15:48 +00:00

History

Jin Xu 00932d1955 Run 4: d26 0.5M batch, ratio 7.25 — 2.49h (9.6% faster) Revert d26 batch size from 1M to 0.5M and lower param-data ratio from 8.25 to 7.25. In the speedrun's undertraining regime, smaller batch with more optimization steps (12,700 vs 7,226) is more efficient than larger batch with fewer steps. Result: CORE 0.2626, time 8967s (2.49h), val_bpb 0.750008 Reproduced: CORE 0.2729/0.2626 across two runs, both pass. AI disclosure: experimental design and hyperparameter search were conducted using Claude Code.		2026-02-08 23:05:13 +00:00
..
estimate_gpt3_core.ipynb	add notebook on deriving the CORE estimates for the GPT-3 miniseries.	2026-01-05 18:40:28 +00:00
gen_synthetic_data.py	tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3	2026-02-02 01:45:59 +00:00
generate_logo.html	initial commit	2025-10-13 06:49:24 -07:00
LEADERBOARD.md	Run 4: d26 0.5M batch, ratio 7.25 — 2.49h (9.6% faster)	2026-02-08 23:05:13 +00:00
LOG.md	briefly mention batch ramp experimentation too, too weak to merge in my few attempts	2026-02-05 22:21:03 +00:00
nanochat.png	Update logo	2025-10-14 14:19:44 -04:00
repackage_data_reference.py	initial commit	2025-10-13 06:49:24 -07:00
scaling_analysis.ipynb	add engram-lite, add log, tune scaling laws analysis scripts	2026-01-27 22:31:17 +00:00
scaling_laws_jan26.png	nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1	2026-01-31 19:12:25 +00:00