nanochat/scripts
Amrit Bulusu 641e8a6dd3 muP implementation: coord check, transfer check, and code quality fixes
- Fix output logit hook in coord check to apply muP scaling (base/width)
- Replace config mutation side effect with assertion in setup_optimizer
- Set mup_base_width at GPTConfig construction in base_train.py
- Remove dead code (_transfer_check_output_mult)
- Tune base LRs to center optimal multiplier near 1.0 (0.12, 6.0, 0.12)
- Use log scale on all loss plots for better low-loss detail
- Add automated muP tests (coord check + transfer check)
- Update muP_changes.md verification commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 16:28:50 -04:00
..
base_eval.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
base_train.py muP implementation: coord check, transfer check, and code quality fixes 2026-03-14 16:28:50 -04:00
chat_cli.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_eval.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_rl.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_sft.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_web.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
mup_coord_check.py muP implementation: coord check, transfer check, and code quality fixes 2026-03-14 16:28:50 -04:00
mup_transfer_check.py muP implementation: coord check, transfer check, and code quality fixes 2026-03-14 16:28:50 -04:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00