Jin Xu
00932d1955
Run 4: d26 0.5M batch, ratio 7.25 — 2.49h (9.6% faster)
...
Revert d26 batch size from 1M to 0.5M and lower param-data ratio from
8.25 to 7.25. In the speedrun's undertraining regime, smaller batch with
more optimization steps (12,700 vs 7,226) is more efficient than larger
batch with fewer steps.
Result: CORE 0.2626, time 8967s (2.49h), val_bpb 0.750008
Reproduced: CORE 0.2729/0.2626 across two runs, both pass.
AI disclosure: experimental design and hyperparameter search were
conducted using Claude Code.
2026-02-08 23:05:13 +00:00
Andrej Karpathy
1ec0a34779
at 28 and above we start to need batch size 8
2026-02-08 18:26:34 +00:00
Andrej Karpathy
ff46300720
tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
2026-02-08 17:54:12 +00:00
Andrej Karpathy
685271dc8d
new optimal ratio for d26 training
2026-02-06 19:21:27 +00:00
Andrej Karpathy
542beb0c8c
bump speedrun to be the up to date leaderboard run
2026-02-04 02:12:04 +00:00
Andrej Karpathy
b19b4f3e49
fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16
2026-02-02 15:50:14 +00:00
Andrej Karpathy
0307997f9b
merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both
2026-02-01 02:36:43 +00:00
Andrej Karpathy
1ddaad1c1c
nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
2026-01-31 19:12:25 +00:00
Andrej Karpathy
02baa15405
i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew
2026-01-30 17:08:53 +00:00
Andrej Karpathy
067daa7758
small fix cpu script ty PR #474
2026-01-30 02:11:25 +00:00
Andrej Karpathy
c88bbf8133
Merge branch 'engram'
2026-01-27 22:33:16 +00:00
Andrej Karpathy
c8d93beed2
add engram-lite, add log, tune scaling laws analysis scripts
2026-01-27 22:31:17 +00:00
Andrej Karpathy
8630d32be4
quick fix to not OOM main speedrun script
2026-01-26 22:31:42 +00:00
Andrej Karpathy
63bb5831e2
something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir
2026-01-18 15:27:41 +00:00