mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-21 04:13:21 +00:00
- Introduced parameters for Mixture of Experts (MoE) in `runmps.sh`, `base_train.py`, and `gpt.py`, allowing for dynamic configuration of experts during training. - Enhanced `gpt.py` with new classes `MoEFeedForward` and `ExpertFFN` to implement MoE functionality in the model architecture. - Updated `configurator.py` to handle type conversions for new MoE parameters. - Improved logging in `base_train.py` to include MoE-related metrics and configurations during training. - Added assertions and derived defaults for MoE parameters to ensure valid configurations. - Implemented methods to estimate and log FLOPs for both dense and MoE active configurations during training. - Enhanced gradient handling in `muon.py` to accommodate potential absence of gradients for unused experts. |
||
|---|---|---|
| .. | ||
| base_eval.py | ||
| base_loss.py | ||
| base_train.py | ||
| chat_cli.py | ||
| chat_eval.py | ||
| chat_rl.py | ||
| chat_sft.py | ||
| chat_web.py | ||
| mid_train.py | ||
| tok_eval.py | ||
| tok_train.py | ||