nanochat/runs
2026-03-04 19:47:12 +00:00
..
miniseries.sh at 28 and above we start to need batch size 8 2026-02-08 18:26:34 +00:00
runcpu.sh
scaling_laws.sh
speedrun.sh big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise 2026-03-04 19:47:12 +00:00