Kaiyue Wen
|
ee04406ebb
|
Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements
Major changes:
- Add custom FP8 training module (replaces torchao dependency)
- Implement auto-calculated optimal batch sizes (1M for d26)
- Add hyperball data scaling
- Restore and tune momentum schedule (settled on 0.95)
- Add matrix warmup ratio and norm_lr parameters
- Improve weight decay scaling (Tepoch-based theory)
- Update d26 configuration and scaling laws
- Clarify MFU labeling as bf16_mfu
- Update leaderboard and documentation
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
|
2026-02-12 16:15:15 -08:00 |
|
Andrej Karpathy
|
64a651a63c
|
include .claude is ok
|
2026-01-29 00:35:02 +00:00 |
|
Andrej Karpathy
|
f5a0ea4d3f
|
take out these gitignore dirs
|
2026-01-08 18:18:42 +00:00 |
|
Andrej Karpathy
|
ccf4b7f9bf
|
nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script
|
2026-01-07 22:11:59 +00:00 |
|
Andrej Karpathy
|
ed2082fbc4
|
sane secrets management
|
2026-01-04 19:29:22 +00:00 |
|
Andrej Karpathy
|
aa42f40e66
|
delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking
|
2026-01-03 23:55:28 +00:00 |
|
Luke Stanley
|
760af62e11
|
Git ignore eval_bundle
|
2025-10-21 23:14:34 +00:00 |
|
Andrej Karpathy
|
fe5aed940b
|
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
|
2025-10-21 15:04:58 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|