Commit Graph

18 Commits

Author SHA1 Message Date
Kaiyue Wen
116900ac16 muonh 2026-02-12 17:51:36 -08:00
Kaiyue Wen
5a965c1383 Remove runs/scaling_laws_muonh.sh
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-12 17:09:19 -08:00
Kaiyue Wen
ee04406ebb Merge muonh-dev and master: FP8 training, optimizer tuning, and scaling improvements
Major changes:
- Add custom FP8 training module (replaces torchao dependency)
- Implement auto-calculated optimal batch sizes (1M for d26)
- Add hyperball data scaling
- Restore and tune momentum schedule (settled on 0.95)
- Add matrix warmup ratio and norm_lr parameters
- Improve weight decay scaling (Tepoch-based theory)
- Update d26 configuration and scaling laws
- Clarify MFU labeling as bf16_mfu
- Update leaderboard and documentation

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-12 16:15:15 -08:00
dangxingyu
924489f582 Update quickrun defaults 2026-02-03 20:46:20 -05:00
dangxingyu
e7ee891c3b Update quickrun script 2026-02-03 20:43:43 -05:00
dangxingyu
a611a85e35 Rename quickrun script 2026-02-03 20:29:55 -05:00
dangxingyu
4686cb9509 Update quickrun wandb mode 2026-02-03 20:26:11 -05:00
dangxingyu
77de3297ea Update warmdown and rename quickrun 2026-02-03 20:25:16 -05:00
dangxingyu
e28d4ead22 Add muonh model and quickrun 2026-02-03 20:14:51 -05:00
Andrej Karpathy
b19b4f3e49 fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16 2026-02-02 15:50:14 +00:00
Andrej Karpathy
0307997f9b merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both 2026-02-01 02:36:43 +00:00
Andrej Karpathy
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1 2026-01-31 19:12:25 +00:00
Andrej Karpathy
02baa15405 i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew 2026-01-30 17:08:53 +00:00
Andrej Karpathy
067daa7758 small fix cpu script ty PR #474 2026-01-30 02:11:25 +00:00
Andrej Karpathy
c88bbf8133 Merge branch 'engram' 2026-01-27 22:33:16 +00:00
Andrej Karpathy
c8d93beed2 add engram-lite, add log, tune scaling laws analysis scripts 2026-01-27 22:31:17 +00:00
Andrej Karpathy
8630d32be4 quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00
Andrej Karpathy
63bb5831e2 something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir 2026-01-18 15:27:41 +00:00