nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-08 02:10:31 +00:00

History

Kaiyue Wen ee04406ebb Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements Major changes: - Add custom FP8 training module (replaces torchao dependency) - Implement auto-calculated optimal batch sizes (1M for d26) - Add hyperball data scaling - Restore and tune momentum schedule (settled on 0.95) - Add matrix warmup ratio and norm_lr parameters - Improve weight decay scaling (Tepoch-based theory) - Update d26 configuration and scaling laws - Clarify MFU labeling as bf16_mfu - Update leaderboard and documentation Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>		2026-02-12 16:15:15 -08:00
..
estimate_gpt3_core.ipynb	add notebook on deriving the CORE estimates for the GPT-3 miniseries.	2026-01-05 18:40:28 +00:00
gen_synthetic_data.py	tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3	2026-02-02 01:45:59 +00:00
generate_logo.html	initial commit	2025-10-13 06:49:24 -07:00
LEADERBOARD.md	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00
LOG.md	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00
nanochat.png	Update logo	2025-10-14 14:19:44 -04:00
repackage_data_reference.py	initial commit	2025-10-13 06:49:24 -07:00
scaling_analysis.ipynb	add engram-lite, add log, tune scaling laws analysis scripts	2026-01-27 22:31:17 +00:00
scaling_laws_jan26.png	nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1	2026-01-31 19:12:25 +00:00