nanochat/scripts
William Thurston 76227f70d3 Add MOE debug interval and logging for gradient statistics
- Introduced `MOE_DEBUG_INTERVAL` parameter in `runmps.sh` to control debug logging frequency during training.
- Enhanced `base_train.py` to log gradients of routed and shared weights at specified intervals, aiding in monitoring model performance.
- Updated `gpt.py` to adjust router bias calculations, improving load balancing among experts.
- Added unit tests in `test_moe.py` to validate the behavior of the MoE implementation and ensure correctness of gradient calculations.
2025-11-13 16:22:20 -08:00
..
base_eval.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
base_loss.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
base_train.py Add MOE debug interval and logging for gradient statistics 2025-11-13 16:22:20 -08:00
chat_cli.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
chat_eval.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
chat_rl.py initial commit 2025-10-13 06:49:24 -07:00
chat_sft.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
chat_web.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
mid_train.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py initial commit 2025-10-13 06:49:24 -07:00