nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-16 02:59:10 +00:00

History

Kaiyue Wen ee04406ebb Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements Major changes: - Add custom FP8 training module (replaces torchao dependency) - Implement auto-calculated optimal batch sizes (1M for d26) - Add hyperball data scaling - Restore and tune momentum schedule (settled on 0.95) - Add matrix warmup ratio and norm_lr parameters - Improve weight decay scaling (Tepoch-based theory) - Update d26 configuration and scaling laws - Clarify MFU labeling as bf16_mfu - Update leaderboard and documentation Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>		2026-02-12 16:15:15 -08:00
..
miniseries.sh	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00
quickrun_muonh.sh	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00
runcpu.sh	merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both	2026-02-01 02:36:43 +00:00
scaling_laws_muonh.sh	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00
scaling_laws.sh	add engram-lite, add log, tune scaling laws analysis scripts	2026-01-27 22:31:17 +00:00
speedrun.sh	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements	2026-02-12 16:15:15 -08:00