mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-21 12:23:13 +00:00
- Introduced `kv_head_mult` to control the number of query heads sharing a key/value head in `base_train.py`, `mid_train.py`, and `runmps.sh`. - Updated logging to include global token per second metrics during training. - Added assertions to ensure `kv_head_mult` is valid and properly integrated into model calculations. |
||
|---|---|---|
| .. | ||
| gen_synthetic_data.py | ||
| generate_logo.html | ||
| nanochat.png | ||
| repackage_data_reference.py | ||
| runcpu.sh | ||
| runmps_evals.sh | ||
| runmps.sh | ||