nanochat/runs
Kaiyue Wen ee04406ebb Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements
Major changes:
- Add custom FP8 training module (replaces torchao dependency)
- Implement auto-calculated optimal batch sizes (1M for d26)
- Add hyperball data scaling
- Restore and tune momentum schedule (settled on 0.95)
- Add matrix warmup ratio and norm_lr parameters
- Improve weight decay scaling (Tepoch-based theory)
- Update d26 configuration and scaling laws
- Clarify MFU labeling as bf16_mfu
- Update leaderboard and documentation

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-12 16:15:15 -08:00
..
miniseries.sh Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements 2026-02-12 16:15:15 -08:00
quickrun_muonh.sh Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements 2026-02-12 16:15:15 -08:00
runcpu.sh merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both 2026-02-01 02:36:43 +00:00
scaling_laws_muonh.sh Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements 2026-02-12 16:15:15 -08:00
scaling_laws.sh add engram-lite, add log, tune scaling laws analysis scripts 2026-01-27 22:31:17 +00:00
speedrun.sh Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements 2026-02-12 16:15:15 -08:00