mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-21 12:23:13 +00:00
- Introduced `MOE_DEBUG_INTERVAL` parameter in `runmps.sh` to control debug logging frequency during training. - Enhanced `base_train.py` to log gradients of routed and shared weights at specified intervals, aiding in monitoring model performance. - Updated `gpt.py` to adjust router bias calculations, improving load balancing among experts. - Added unit tests in `test_moe.py` to validate the behavior of the MoE implementation and ensure correctness of gradient calculations. |
||
|---|---|---|
| .. | ||
| base_eval.py | ||
| base_loss.py | ||
| base_train.py | ||
| chat_cli.py | ||
| chat_eval.py | ||
| chat_rl.py | ||
| chat_sft.py | ||
| chat_web.py | ||
| mid_train.py | ||
| tok_eval.py | ||
| tok_train.py | ||