This website requires JavaScript.
Explore
Help
Sign In
tacit
/
nanochat
Watch
1
Star
0
Fork
0
You've already forked nanochat
mirror of
https://github.com/karpathy/nanochat.git
synced
2026-03-08 02:10:31 +00:00
Code
Issues
Actions
Packages
Projects
Releases
Wiki
Activity
master
nanochat
/
runs
History
Andrej Karpathy
324e69c45d
big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
2026-03-04 19:47:12 +00:00
..
miniseries.sh
at 28 and above we start to need batch size 8
2026-02-08 18:26:34 +00:00
runcpu.sh
merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both
2026-02-01 02:36:43 +00:00
scaling_laws.sh
add engram-lite, add log, tune scaling laws analysis scripts
2026-01-27 22:31:17 +00:00
speedrun.sh
big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
2026-03-04 19:47:12 +00:00