Commit Graph

10 Commits

Author SHA1 Message Date
William Thurston
9550053cc1 Enhance model tagging support in training and evaluation scripts
- Added model tagging functionality to `runmps.sh`, allowing for dynamic model tagging based on the W&B run name.
- Updated `base_train.py`, `mid_train.py`, and `chat_sft.py` to utilize model tags for checkpoint management.
- Enhanced `base_eval.py` to accept model tags for loading models during evaluation.
- Improved handling of model tags to ensure proper checkpoint directory naming and logging.
2025-11-10 19:45:02 -08:00
William Thurston
8a6d34daf7 Add kv_head_mult parameter for training and evaluation scripts
- Introduced `kv_head_mult` to control the number of query heads sharing a key/value head in `base_train.py`, `mid_train.py`, and `runmps.sh`.
- Updated logging to include global token per second metrics during training.
- Added assertions to ensure `kv_head_mult` is valid and properly integrated into model calculations.
2025-11-09 14:23:45 -08:00
William Thurston
b1d49aade5 Add scripts for running evaluations and training with W&B integration
- Added `dev/runmps_evals.sh` for evaluating checkpoints and logging results to W&B.
- Introduced `dev/runmps.sh` for orchestrating training stages with W&B support.
- Updated `.gitignore` to include `wandb/` and `.runmps_wandb_ids`.
- Changed permissions for `dev/runcpu.sh` and added executable flag.
- Enhanced existing scripts to log metrics to W&B during training and evaluation processes.
2025-11-05 11:49:50 -08:00
Luke Stanley
901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-21 23:14:16 +00:00
Andrej Karpathy
94ee507054 quick fix base eval due to fewshot requirement 2025-10-21 17:56:08 +00:00
Andrej Karpathy
5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy
fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok 2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
karpathy
a53833d04f add nanochat logo png 2025-10-13 06:59:59 -07:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00