mirror of
https://github.com/karpathy/nanochat.git
synced 2026-04-01 21:25:21 +00:00
- Fix output logit hook in coord check to apply muP scaling (base/width) - Replace config mutation side effect with assertion in setup_optimizer - Set mup_base_width at GPTConfig construction in base_train.py - Remove dead code (_transfer_check_output_mult) - Tune base LRs to center optimal multiplier near 1.0 (0.12, 6.0, 0.12) - Use log scale on all loss plots for better low-loss detail - Add automated muP tests (coord check + transfer check) - Update muP_changes.md verification commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| test_attention_fallback.py | ||
| test_engine.py | ||
| test_mup.py | ||