nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-18 03:59:09 +00:00

History

Kaiyue Wen ee04406ebb Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements Major changes: - Add custom FP8 training module (replaces torchao dependency) - Implement auto-calculated optimal batch sizes (1M for d26) - Add hyperball data scaling - Restore and tune momentum schedule (settled on 0.95) - Add matrix warmup ratio and norm_lr parameters - Improve weight decay scaling (Tepoch-based theory) - Update d26 configuration and scaling laws - Clarify MFU labeling as bf16_mfu - Update leaderboard and documentation Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>		2026-02-12 16:15:15 -08:00
..
base_eval.py	small touchups to the eval script, re-order items etc, cosmetic	2026-02-03 21:03:42 +00:00
base_train.py	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00
chat_cli.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_eval.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_rl.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_sft.py	fix bug in chat_sft, the attention window must be preserved sigh	2026-02-01 20:58:44 +00:00
chat_web.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00