From da137191be40c413335f81b018e4f2b1ddfc68dd Mon Sep 17 00:00:00 2001 From: svlandeg Date: Mon, 30 Mar 2026 19:39:44 +0200 Subject: [PATCH 1/4] update numbers from 3h to 1.5h --- README.md | 4 ++-- runs/speedrun.sh | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 483f3e38..610140cd 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ The most fun you can have is to train your own GPT-2 and talk to it. The entire bash runs/speedrun.sh ``` -You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it: +You may wish to do so in a screen session as this will take ~1.5 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it: ```bash python -m scripts.chat_web @@ -190,7 +190,7 @@ I've published a number of guides that might contain helpful information, most r ## Contributing -The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~3 hours, but by improving the pretraining stage we can improve this further. +The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~1.5 hours (down from 3h), but by improving the pretraining stage we can improve this further. Current AI policy: disclosure. When submitting a PR, please declare any parts that had substantial LLM contribution and that you have not written or that you do not fully understand. diff --git a/runs/speedrun.sh b/runs/speedrun.sh index 48fcc68a..8c8cad29 100644 --- a/runs/speedrun.sh +++ b/runs/speedrun.sh @@ -1,11 +1,11 @@ #!/bin/bash # This script is configured to train your own GPT-2 grade LLM (pretraining + finetuning) -# It is designed to run on a blank 8XH100 GPU node and takes approximately 3 hours to complete. +# It is designed to run on a blank 8XH100 GPU node and takes approximately 1.5 hours to complete. # 1) Example launch (simplest): # bash runs/speedrun.sh -# 2) Example launch in a screen session (because the run takes ~3 hours): +# 2) Example launch in a screen session (because the run takes ~1.5 hours): # screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh # 3) Example launch with wandb logging, but see below for setting up wandb first: # WANDB_RUN=speedrun screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh From c2082b39528a0f22d9178c78ccca4cd47030bf42 Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 31 Mar 2026 10:55:22 +0200 Subject: [PATCH 2/4] print CORE score in chat_eval --- scripts/chat_eval.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/chat_eval.py b/scripts/chat_eval.py index 858d4c29..b5bbd11a 100644 --- a/scripts/chat_eval.py +++ b/scripts/chat_eval.py @@ -241,6 +241,7 @@ if __name__ == "__main__": centered_acc = (acc - baseline_acc) / (1.0 - baseline_acc) centered_mean += centered_acc chatcore_metric = centered_mean / len(results) + print0(f"CORE score: {100 * chatcore_metric:.2f}%") chatcore_metric_dict = {"ChatCORE metric": chatcore_metric} get_report().log(section="Chat evaluation " + args.source, data=[ vars(args), # CLI args From b4e67636adf5577a4b93254b9a7df68289948c3a Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 31 Mar 2026 10:59:47 +0200 Subject: [PATCH 3/4] fix layer comment --- runs/runcpu.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runs/runcpu.sh b/runs/runcpu.sh index 853fa1f3..bf6bab33 100755 --- a/runs/runcpu.sh +++ b/runs/runcpu.sh @@ -26,7 +26,7 @@ python -m nanochat.dataset -n 8 python -m scripts.tok_train --max-chars=2000000000 python -m scripts.tok_eval -# train a small 4 layer model +# train a small 6 layer model # I tuned this run to complete in about 30 minutes on my MacBook Pro M3 Max. # To get better results, try increasing num_iterations, or get other ideas from your favorite LLM. python -m scripts.base_train \ From bc78a7175e9717ddf99be30a2bda6f2660991c3d Mon Sep 17 00:00:00 2001 From: svlandeg Date: Mon, 13 Apr 2026 16:36:01 +0200 Subject: [PATCH 4/4] add required -i to example chat_eval scripts --- scripts/chat_eval.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/chat_eval.py b/scripts/chat_eval.py index b5bbd11a..7ea3e4ae 100644 --- a/scripts/chat_eval.py +++ b/scripts/chat_eval.py @@ -4,8 +4,8 @@ All the generic code lives here, and all the evaluation-specific code lives in nanochat directory and is imported from here. Example runs: -python -m scripts.chat_eval -a ARC-Easy -torchrun --nproc_per_node=8 -m scripts.chat_eval -- -a ARC-Easy +python -m scripts.chat_eval -i sft -a ARC-Easy +torchrun --nproc_per_node=8 -m scripts.chat_eval -- -i sft -a ARC-Easy """ import argparse