nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-18 20:19:08 +00:00

Author	SHA1	Message	Date
geopti	16755495bc	fix(miniseries): extract tokens_trained from log instead of hardcoding batch size Same bug as scaling_laws.sh: TOKENS_TRAINED was computed as NUM_ITERS * 524288, hardcoding the default total batch size. When base_train auto-computes a different batch size, the value is wrong. Fix by reading "Total number of training tokens:" directly from the training log.	2026-02-28 20:43:34 +00:00
geopti	fb2be07e17	fix: correct CSV extraction in scaling_laws.sh Two bugs caused all parameter columns and tokens_trained to be silently empty/wrong in the results CSV: 1. Parameter grep patterns did not account for the padded key format. base_train.py prints parameters as `{key:24s}: {value:,}`, e.g. `wte : 33,554,432`, so patterns like `grep "wte:"` never matched. Fixed by using `grep -P "wte\s+:"` to handle the spaces. 2. tokens_trained was hardcoded as `NUM_ITERS * 524288`, but the batch size is auto-computed by base_train.py and may differ from 524288 depending on the FLOPs budget and model size. Fixed by extracting the actual value from the log line "Total number of training tokens: X".	2026-02-28 16:37:04 +00:00
Andrej Karpathy	bb5137860e	fix comment	2026-02-18 23:26:22 +00:00
Andrej Karpathy	1ec0a34779	at 28 and above we start to need batch size 8	2026-02-08 18:26:34 +00:00
Andrej Karpathy	ff46300720	tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing	2026-02-08 17:54:12 +00:00
Andrej Karpathy	685271dc8d	new optimal ratio for d26 training	2026-02-06 19:21:27 +00:00
Andrej Karpathy	542beb0c8c	bump speedrun to be the up to date leaderboard run	2026-02-04 02:12:04 +00:00
Andrej Karpathy	b19b4f3e49	fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16	2026-02-02 15:50:14 +00:00
Andrej Karpathy	0307997f9b	merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both	2026-02-01 02:36:43 +00:00
Andrej Karpathy	1ddaad1c1c	nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1	2026-01-31 19:12:25 +00:00
Andrej Karpathy	02baa15405	i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew	2026-01-30 17:08:53 +00:00
Andrej Karpathy	067daa7758	small fix cpu script ty PR #474	2026-01-30 02:11:25 +00:00
Andrej Karpathy	c88bbf8133	Merge branch 'engram'	2026-01-27 22:33:16 +00:00
Andrej Karpathy	c8d93beed2	add engram-lite, add log, tune scaling laws analysis scripts	2026-01-27 22:31:17 +00:00
Andrej Karpathy	8630d32be4	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00
Andrej Karpathy	63bb5831e2	something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir	2026-01-18 15:27:41 +00:00

16 Commits