nanochat/scripts
William Thurston 25d2573f47 Add MoE configuration and implementation in training scripts and model architecture
- Introduced parameters for Mixture of Experts (MoE) in `runmps.sh`, `base_train.py`, and `gpt.py`, allowing for dynamic configuration of experts during training.
- Enhanced `gpt.py` with new classes `MoEFeedForward` and `ExpertFFN` to implement MoE functionality in the model architecture.
- Updated `configurator.py` to handle type conversions for new MoE parameters.
- Improved logging in `base_train.py` to include MoE-related metrics and configurations during training.
- Added assertions and derived defaults for MoE parameters to ensure valid configurations.
- Implemented methods to estimate and log FLOPs for both dense and MoE active configurations during training.
- Enhanced gradient handling in `muon.py` to accommodate potential absence of gradients for unused experts.
2025-11-11 19:58:38 -08:00
..
base_eval.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
base_loss.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
base_train.py Add MoE configuration and implementation in training scripts and model architecture 2025-11-11 19:58:38 -08:00
chat_cli.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
chat_eval.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
chat_rl.py initial commit 2025-10-13 06:49:24 -07:00
chat_sft.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
chat_web.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
mid_train.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py initial commit 2025-10-13 06:49:24 -07:00