nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-20 20:03:19 +00:00

History

William Thurston 25d2573f47 Add MoE configuration and implementation in training scripts and model architecture - Introduced parameters for Mixture of Experts (MoE) in `runmps.sh`, `base_train.py`, and `gpt.py`, allowing for dynamic configuration of experts during training. - Enhanced `gpt.py` with new classes `MoEFeedForward` and `ExpertFFN` to implement MoE functionality in the model architecture. - Updated `configurator.py` to handle type conversions for new MoE parameters. - Improved logging in `base_train.py` to include MoE-related metrics and configurations during training. - Added assertions and derived defaults for MoE parameters to ensure valid configurations. - Implemented methods to estimate and log FLOPs for both dense and MoE active configurations during training. - Enhanced gradient handling in `muon.py` to accommodate potential absence of gradients for unused experts.		2025-11-11 19:58:38 -08:00
..
gen_synthetic_data.py	add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok	2025-10-21 15:04:58 +00:00
generate_logo.html	initial commit	2025-10-13 06:49:24 -07:00
nanochat.png	add nanochat logo png	2025-10-13 06:59:59 -07:00
repackage_data_reference.py	initial commit	2025-10-13 06:49:24 -07:00
runcpu.sh	Add scripts for running evaluations and training with W&B integration	2025-11-05 11:49:50 -08:00
runmps_evals.sh	Add scripts for running evaluations and training with W&B integration	2025-11-05 11:49:50 -08:00
runmps.sh	Add MoE configuration and implementation in training scripts and model architecture	2025-11-11 19:58:38 -08:00