nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-18 03:59:09 +00:00

History

geopti fb2be07e17 fix: correct CSV extraction in scaling_laws.sh Two bugs caused all parameter columns and tokens_trained to be silently empty/wrong in the results CSV: 1. Parameter grep patterns did not account for the padded key format. base_train.py prints parameters as `{key:24s}: {value:,}`, e.g. `wte : 33,554,432`, so patterns like `grep "wte:"` never matched. Fixed by using `grep -P "wte\s+:"` to handle the spaces. 2. tokens_trained was hardcoded as `NUM_ITERS * 524288`, but the batch size is auto-computed by base_train.py and may differ from 524288 depending on the FLOPs budget and model size. Fixed by extracting the actual value from the log line "Total number of training tokens: X".		2026-02-28 16:37:04 +00:00
..
miniseries.sh	at 28 and above we start to need batch size 8	2026-02-08 18:26:34 +00:00
runcpu.sh	merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both	2026-02-01 02:36:43 +00:00
scaling_laws.sh	fix: correct CSV extraction in scaling_laws.sh	2026-02-28 16:37:04 +00:00
speedrun.sh	fix comment	2026-02-18 23:26:22 +00:00