mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-14 17:03:13 +00:00
Major changes: - Add custom FP8 training module (replaces torchao dependency) - Implement auto-calculated optimal batch sizes (1M for d26) - Add hyperball data scaling - Restore and tune momentum schedule (settled on 0.95) - Add matrix warmup ratio and norm_lr parameters - Improve weight decay scaling (Tepoch-based theory) - Update d26 configuration and scaling laws - Clarify MFU labeling as bf16_mfu - Update leaderboard and documentation Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
15 lines
115 B
Plaintext
15 lines
115 B
Plaintext
.venv/
|
|
__pycache__/
|
|
*.pyc
|
|
dev-ignore/
|
|
report.md
|
|
eval_bundle/
|
|
|
|
# Secrets
|
|
.env
|
|
|
|
# Local setup
|
|
cache
|
|
CLAUDE.md
|
|
wandb/
|