nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-04-05 07:05:28 +00:00

Author	SHA1	Message	Date
Jin Xu	00932d1955	Run 4: d26 0.5M batch, ratio 7.25 — 2.49h (9.6% faster) Revert d26 batch size from 1M to 0.5M and lower param-data ratio from 8.25 to 7.25. In the speedrun's undertraining regime, smaller batch with more optimization steps (12,700 vs 7,226) is more efficient than larger batch with fewer steps. Result: CORE 0.2626, time 8967s (2.49h), val_bpb 0.750008 Reproduced: CORE 0.2729/0.2626 across two runs, both pass. AI disclosure: experimental design and hyperparameter search were conducted using Claude Code.	2026-02-08 23:05:13 +00:00
Andrej Karpathy	5fdd5cdb24	new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier	2026-02-05 20:11:32 +00:00
Sofie Van Landeghem	012da1a78b	Typo fixes (#480 ) * small typo * few more small fixes * small fixes in leaderboard.md	2026-02-05 19:12:50 +01:00
Andrej Karpathy	16b8ac7da3	oops forgot to attach leaderboard file too	2026-02-03 21:06:12 +00:00