Merge bc78a7175e into 0aaca56805

2026-06-15 10:39:08 +00:00 · 2026-04-16 20:39:52 -07:00 · 2026-04-16 20:39:52 -07:00 · 9457e1805c
commit 9457e1805c
parent 0aaca56805 bc78a7175e
4 changed files with 8 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -51,7 +51,7 @@ The most fun you can have is to train your own GPT-2 and talk to it. The entire
 bash runs/speedrun.sh
 ```

-You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
+You may wish to do so in a screen session as this will take ~1.5 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:

 ```bash
 python -m scripts.chat_web
@ -190,7 +190,7 @@ I've published a number of guides that might contain helpful information, most r

 ## Contributing

-The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~3 hours, but by improving the pretraining stage we can improve this further.
+The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~1.5 hours (down from 3h), but by improving the pretraining stage we can improve this further.

 Current AI policy: disclosure. When submitting a PR, please declare any parts that had substantial LLM contribution and that you have not written or that you do not fully understand.

--- a/runs/runcpu.sh
+++ b/runs/runcpu.sh
@ -26,7 +26,7 @@ python -m nanochat.dataset -n 8
 python -m scripts.tok_train --max-chars=2000000000
 python -m scripts.tok_eval

-# train a small 4 layer model
+# train a small 6 layer model
 # I tuned this run to complete in about 30 minutes on my MacBook Pro M3 Max.
 # To get better results, try increasing num_iterations, or get other ideas from your favorite LLM.
 python -m scripts.base_train \
--- a/runs/speedrun.sh
+++ b/runs/speedrun.sh
@ -1,11 +1,11 @@
 #!/bin/bash

 # This script is configured to train your own GPT-2 grade LLM (pretraining + finetuning)
-# It is designed to run on a blank 8XH100 GPU node and takes approximately 3 hours to complete.
+# It is designed to run on a blank 8XH100 GPU node and takes approximately 1.5 hours to complete.

 # 1) Example launch (simplest):
 # bash runs/speedrun.sh
-# 2) Example launch in a screen session (because the run takes ~3 hours):
+# 2) Example launch in a screen session (because the run takes ~1.5 hours):
 # screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
 # 3) Example launch with wandb logging, but see below for setting up wandb first:
 # WANDB_RUN=speedrun screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
--- a/scripts/chat_eval.py
+++ b/scripts/chat_eval.py
@ -4,8 +4,8 @@ All the generic code lives here, and all the evaluation-specific
 code lives in nanochat directory and is imported from here.

 Example runs:
-python -m scripts.chat_eval -a ARC-Easy
-torchrun --nproc_per_node=8 -m scripts.chat_eval -- -a ARC-Easy
+python -m scripts.chat_eval -i sft -a ARC-Easy
+torchrun --nproc_per_node=8 -m scripts.chat_eval -- -i sft -a ARC-Easy
 """

 import argparse
@ -241,6 +241,7 @@ if __name__ == "__main__":
            centered_acc = (acc - baseline_acc) / (1.0 - baseline_acc)
            centered_mean += centered_acc
        chatcore_metric = centered_mean / len(results)
+        print0(f"CORE score: {100 * chatcore_metric:.2f}%")
        chatcore_metric_dict = {"ChatCORE metric": chatcore_metric}
    get_report().log(section="Chat evaluation " + args.source, data=[
        vars(args), # CLI args