Fix args in readme (#438)

* fix commands in readme, using new arg format

* fix typo

* add required -i flag to chat_eval example runs
This commit is contained in:
Sofie Van Landeghem 2026-01-16 01:26:38 +01:00 committed by GitHub
parent bdcc030ffa
commit d4ea28d4e2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 5 additions and 5 deletions

View File

@ -82,10 +82,10 @@ That said, to give a sense, the example changes needed for the [speedrun.sh](spe
python -m nanochat.dataset -n 450 &
...
# use --depth to increase model size. to not oom, halve device batch size 32 -> 16:
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26 --device_batch_size=16
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26 --device-batch-size=16
...
# make sure to use the same later during midtraining:
torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --device_batch_size=16
torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --device-batch-size=16
```
That's it! The biggest thing to pay attention to is making sure you have enough data shards to train on (the code will loop and do more epochs over the same training set otherwise, decreasing learning speed a bit), and managing your memory/VRAM, primarily by decreasing the `device_batch_size` until things fit (the scripts automatically compensate by increasing the number of gradient accumulation loops, simply turning parallel compute to sequential compute).

View File

@ -4,8 +4,8 @@ All the generic code lives here, and all the evaluation-specific
code lives in nanochat directory and is imported from here.
Example runs:
python -m scripts.chat_eval -a ARC-Easy
torchrun --nproc_per_node=8 -m scripts.chat_eval -- -a ARC-Easy
python -m scripts.chat_eval -i mid -a ARC-Easy
torchrun --nproc_per_node=8 -m scripts.chat_eval -- -i mid -a ARC-Easy
"""
import argparse

View File

@ -25,7 +25,7 @@ class CustomJSON(Task):
print("-" * 80)
print(f"Warning: File {filepath} does not exist")
print("HINT (Oct 21 2025)")
print("If you recently did a git pull and suddely see this, it might be due to the new addition of identity conversations")
print("If you recently did a git pull and suddenly see this, it might be due to the new addition of identity conversations")
print("See this discussion for more details: https://github.com/karpathy/nanochat/discussions/139")
print("Quick fix: simply run the following command to download the file and you're done:")
print(f"curl -L -o {filepath} https://karpathy-public.s3.us-west-2.amazonaws.com/identity_conversations.jsonl")