nanochat/.gitignore at 330fa1188c4dfd41345307cb90b71c06b4b37dcd - nanochat - Gitea: Git with a cup of tea

tacit/nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-16 11:09:09 +00:00

Kaiyue Wen ee04406ebb Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements

Major changes:
- Add custom FP8 training module (replaces torchao dependency)
- Implement auto-calculated optimal batch sizes (1M for d26)
- Add hyperball data scaling
- Restore and tune momentum schedule (settled on 0.95)
- Add matrix warmup ratio and norm_lr parameters
- Improve weight decay scaling (Tepoch-based theory)
- Update d26 configuration and scaling laws
- Clarify MFU labeling as bf16_mfu
- Update leaderboard and documentation

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

2026-02-12 16:15:15 -08:00

15 lines

115 B

Plaintext

Raw Blame History

 .venv/
 __pycache__/
 *.pyc
 dev-ignore/
 report.md
 eval_bundle/
 # Secrets
 .env
 # Local setup
 cache
 CLAUDE.md
 wandb/