From da137191be40c413335f81b018e4f2b1ddfc68dd Mon Sep 17 00:00:00 2001
From: svlandeg <svlandeg@github.com>
Date: Mon, 30 Mar 2026 19:39:44 +0200
Subject: [PATCH 1/4] update numbers from 3h to 1.5h

---
 README.md        | 4 ++--
 runs/speedrun.sh | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 483f3e38..610140cd 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ The most fun you can have is to train your own GPT-2 and talk to it. The entire
 bash runs/speedrun.sh
 ```
 
-You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
+You may wish to do so in a screen session as this will take ~1.5 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
 
 ```bash
 python -m scripts.chat_web
@@ -190,7 +190,7 @@ I've published a number of guides that might contain helpful information, most r
 
 ## Contributing
 
-The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~3 hours, but by improving the pretraining stage we can improve this further.
+The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~1.5 hours (down from 3h), but by improving the pretraining stage we can improve this further.
 
 Current AI policy: disclosure. When submitting a PR, please declare any parts that had substantial LLM contribution and that you have not written or that you do not fully understand.
 
diff --git a/runs/speedrun.sh b/runs/speedrun.sh
index 48fcc68a..8c8cad29 100644
--- a/runs/speedrun.sh
+++ b/runs/speedrun.sh
@@ -1,11 +1,11 @@
 #!/bin/bash
 
 # This script is configured to train your own GPT-2 grade LLM (pretraining + finetuning)
-# It is designed to run on a blank 8XH100 GPU node and takes approximately 3 hours to complete.
+# It is designed to run on a blank 8XH100 GPU node and takes approximately 1.5 hours to complete.
 
 # 1) Example launch (simplest):
 # bash runs/speedrun.sh
-# 2) Example launch in a screen session (because the run takes ~3 hours):
+# 2) Example launch in a screen session (because the run takes ~1.5 hours):
 # screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
 # 3) Example launch with wandb logging, but see below for setting up wandb first:
 # WANDB_RUN=speedrun screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh

From c2082b39528a0f22d9178c78ccca4cd47030bf42 Mon Sep 17 00:00:00 2001
From: svlandeg <svlandeg@github.com>
Date: Tue, 31 Mar 2026 10:55:22 +0200
Subject: [PATCH 2/4] print CORE score in chat_eval

---
 scripts/chat_eval.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/chat_eval.py b/scripts/chat_eval.py
index 858d4c29..b5bbd11a 100644
--- a/scripts/chat_eval.py
+++ b/scripts/chat_eval.py
@@ -241,6 +241,7 @@ if __name__ == "__main__":
             centered_acc = (acc - baseline_acc) / (1.0 - baseline_acc)
             centered_mean += centered_acc
         chatcore_metric = centered_mean / len(results)
+        print0(f"CORE score: {100 * chatcore_metric:.2f}%")
         chatcore_metric_dict = {"ChatCORE metric": chatcore_metric}
     get_report().log(section="Chat evaluation " + args.source, data=[
         vars(args), # CLI args

From b4e67636adf5577a4b93254b9a7df68289948c3a Mon Sep 17 00:00:00 2001
From: svlandeg <svlandeg@github.com>
Date: Tue, 31 Mar 2026 10:59:47 +0200
Subject: [PATCH 3/4] fix layer comment

---
 runs/runcpu.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/runs/runcpu.sh b/runs/runcpu.sh
index 853fa1f3..bf6bab33 100755
--- a/runs/runcpu.sh
+++ b/runs/runcpu.sh
@@ -26,7 +26,7 @@ python -m nanochat.dataset -n 8
 python -m scripts.tok_train --max-chars=2000000000
 python -m scripts.tok_eval
 
-# train a small 4 layer model
+# train a small 6 layer model
 # I tuned this run to complete in about 30 minutes on my MacBook Pro M3 Max.
 # To get better results, try increasing num_iterations, or get other ideas from your favorite LLM.
 python -m scripts.base_train \

From bc78a7175e9717ddf99be30a2bda6f2660991c3d Mon Sep 17 00:00:00 2001
From: svlandeg <svlandeg@github.com>
Date: Mon, 13 Apr 2026 16:36:01 +0200
Subject: [PATCH 4/4] add required -i to example chat_eval scripts

---
 scripts/chat_eval.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/chat_eval.py b/scripts/chat_eval.py
index b5bbd11a..7ea3e4ae 100644
--- a/scripts/chat_eval.py
+++ b/scripts/chat_eval.py
@@ -4,8 +4,8 @@ All the generic code lives here, and all the evaluation-specific
 code lives in nanochat directory and is imported from here.
 
 Example runs:
-python -m scripts.chat_eval -a ARC-Easy
-torchrun --nproc_per_node=8 -m scripts.chat_eval -- -a ARC-Easy
+python -m scripts.chat_eval -i sft -a ARC-Easy
+torchrun --nproc_per_node=8 -m scripts.chat_eval -- -i sft -a ARC-Easy
 """
 
 import argparse