Commit Graph

12 Commits

Author SHA1 Message Date
Daniel Aioanei
b510e6648e Download the minimum number of parquet shards to train the tokenizer reproducibly 2026-01-09 22:37:53 +01:00
Andrej Karpathy
ccf4b7f9bf nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script 2026-01-07 22:11:59 +00:00
Andrej Karpathy
aa42f40e66 delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking 2026-01-03 23:55:28 +00:00
svlandeg
f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts 2025-11-04 21:36:10 +01:00
Jing Zhang
ba4f40bf58
Update run1000.sh to add missing --run=$WANDB_RUN 2025-11-01 21:27:00 -07:00
Andrej Karpathy
cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts 2025-11-01 16:04:38 +00:00
Andrej Karpathy
48387cd895 also bump run1000.sh to new uv sync 2025-10-22 16:08:31 +00:00
Andrej Karpathy
5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy
fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok 2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
Tancrède Lepoint
b1443dc98c export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 14:05:40 -04:00
Andrej Karpathy
fae3aca951 add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now 2025-10-15 20:32:22 +00:00