Commit Graph

16 Commits

Author SHA1 Message Date
geopti
16755495bc fix(miniseries): extract tokens_trained from log instead of hardcoding batch size
Same bug as scaling_laws.sh: TOKENS_TRAINED was computed as NUM_ITERS * 524288,
hardcoding the default total batch size. When base_train auto-computes a different
batch size, the value is wrong. Fix by reading "Total number of training tokens:"
directly from the training log.
2026-02-28 20:43:34 +00:00
geopti
fb2be07e17 fix: correct CSV extraction in scaling_laws.sh
Two bugs caused all parameter columns and tokens_trained to be silently
empty/wrong in the results CSV:

1. Parameter grep patterns did not account for the padded key format.
   base_train.py prints parameters as `{key:24s}: {value:,}`, e.g.
   `wte                     : 33,554,432`, so patterns like `grep "wte:"`
   never matched. Fixed by using `grep -P "wte\s+:"` to handle the spaces.

2. tokens_trained was hardcoded as `NUM_ITERS * 524288`, but the batch
   size is auto-computed by base_train.py and may differ from 524288
   depending on the FLOPs budget and model size. Fixed by extracting the
   actual value from the log line "Total number of training tokens: X".
2026-02-28 16:37:04 +00:00
Andrej Karpathy
bb5137860e fix comment 2026-02-18 23:26:22 +00:00
Andrej Karpathy
1ec0a34779 at 28 and above we start to need batch size 8 2026-02-08 18:26:34 +00:00
Andrej Karpathy
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing 2026-02-08 17:54:12 +00:00
Andrej Karpathy
685271dc8d new optimal ratio for d26 training 2026-02-06 19:21:27 +00:00
Andrej Karpathy
542beb0c8c bump speedrun to be the up to date leaderboard run 2026-02-04 02:12:04 +00:00
Andrej Karpathy
b19b4f3e49 fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16 2026-02-02 15:50:14 +00:00
Andrej Karpathy
0307997f9b merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both 2026-02-01 02:36:43 +00:00
Andrej Karpathy
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1 2026-01-31 19:12:25 +00:00
Andrej Karpathy
02baa15405 i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew 2026-01-30 17:08:53 +00:00
Andrej Karpathy
067daa7758 small fix cpu script ty PR #474 2026-01-30 02:11:25 +00:00
Andrej Karpathy
c88bbf8133 Merge branch 'engram' 2026-01-27 22:33:16 +00:00
Andrej Karpathy
c8d93beed2 add engram-lite, add log, tune scaling laws analysis scripts 2026-01-27 22:31:17 +00:00
Andrej Karpathy
8630d32be4 quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00
Andrej Karpathy
63bb5831e2 something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir 2026-01-18 15:27:41 +00:00