Matt Van Horn
861414d500
add explicit --model-tag to run scripts
...
Without --model-tag, chat_sft/chat_cli/chat_web/base_eval can pick the
wrong model when multiple models exist in the cache. Add explicit
--model-tag=d6 (runcpu) and --model-tag=d24 (speedrun) matching the
depth used in each script's base_train call.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:48 -07:00
Andrej Karpathy
324e69c45d
big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
2026-03-04 19:47:12 +00:00
Andrej Karpathy
bb5137860e
fix comment
2026-02-18 23:26:22 +00:00
Andrej Karpathy
1ec0a34779
at 28 and above we start to need batch size 8
2026-02-08 18:26:34 +00:00
Andrej Karpathy
ff46300720
tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
2026-02-08 17:54:12 +00:00
Andrej Karpathy
685271dc8d
new optimal ratio for d26 training
2026-02-06 19:21:27 +00:00
Andrej Karpathy
542beb0c8c
bump speedrun to be the up to date leaderboard run
2026-02-04 02:12:04 +00:00
Andrej Karpathy
b19b4f3e49
fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16
2026-02-02 15:50:14 +00:00
Andrej Karpathy
0307997f9b
merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both
2026-02-01 02:36:43 +00:00
Andrej Karpathy
1ddaad1c1c
nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
2026-01-31 19:12:25 +00:00
Andrej Karpathy
02baa15405
i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew
2026-01-30 17:08:53 +00:00
Andrej Karpathy
067daa7758
small fix cpu script ty PR #474
2026-01-30 02:11:25 +00:00
Andrej Karpathy
c88bbf8133
Merge branch 'engram'
2026-01-27 22:33:16 +00:00
Andrej Karpathy
c8d93beed2
add engram-lite, add log, tune scaling laws analysis scripts
2026-01-27 22:31:17 +00:00
Andrej Karpathy
8630d32be4
quick fix to not OOM main speedrun script
2026-01-26 22:31:42 +00:00
Andrej Karpathy
63bb5831e2
something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir
2026-01-18 15:27:41 +00:00