nanochat/scripts
William Thurston 8a6d34daf7 Add kv_head_mult parameter for training and evaluation scripts
- Introduced `kv_head_mult` to control the number of query heads sharing a key/value head in `base_train.py`, `mid_train.py`, and `runmps.sh`.
- Updated logging to include global token per second metrics during training.
- Added assertions to ensure `kv_head_mult` is valid and properly integrated into model calculations.
2025-11-09 14:23:45 -08:00
..
base_eval.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
base_loss.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
base_train.py Add kv_head_mult parameter for training and evaluation scripts 2025-11-09 14:23:45 -08:00
chat_cli.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
chat_eval.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
chat_rl.py initial commit 2025-10-13 06:49:24 -07:00
chat_sft.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
chat_web.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
mid_train.py Add kv_head_mult parameter for training and evaluation scripts 2025-11-09 14:23:45 -08:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py initial commit 2025-10-13 06:49:24 -07:00