nanochat

tacit/nanochat

Fork 0

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-10 11:15:30 +00:00

Commit Graph

Author	SHA1	Message	Date
Kaiyue Wen	ee04406ebb	Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements Major changes: - Add custom FP8 training module (replaces torchao dependency) - Implement auto-calculated optimal batch sizes (1M for d26) - Add hyperball data scaling - Restore and tune momentum schedule (settled on 0.95) - Add matrix warmup ratio and norm_lr parameters - Improve weight decay scaling (Tepoch-based theory) - Update d26 configuration and scaling laws - Clarify MFU labeling as bf16_mfu - Update leaderboard and documentation Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>	2026-02-12 16:15:15 -08:00
Andrej Karpathy	16b8ac7da3	oops forgot to attach leaderboard file too	2026-02-03 21:06:12 +00:00

Author

SHA1

Message

Date

Kaiyue Wen

ee04406ebb

Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements

Major changes:
- Add custom FP8 training module (replaces torchao dependency)
- Implement auto-calculated optimal batch sizes (1M for d26)
- Add hyperball data scaling
- Restore and tune momentum schedule (settled on 0.95)
- Add matrix warmup ratio and norm_lr parameters
- Improve weight decay scaling (Tepoch-based theory)
- Update d26 configuration and scaling laws
- Clarify MFU labeling as bf16_mfu
- Update leaderboard and documentation

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

2026-02-12 16:15:15 -08:00

Andrej Karpathy

16b8ac7da3

oops forgot to attach leaderboard file too

2026-02-03 21:06:12 +00:00

2 Commits