mirror of
https://github.com/karpathy/nanochat.git
synced 2026-06-15 10:39:08 +00:00
Merge bc78a7175e into 0aaca56805
This commit is contained in:
commit
9457e1805c
|
|
@ -51,7 +51,7 @@ The most fun you can have is to train your own GPT-2 and talk to it. The entire
|
|||
bash runs/speedrun.sh
|
||||
```
|
||||
|
||||
You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
|
||||
You may wish to do so in a screen session as this will take ~1.5 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
|
||||
|
||||
```bash
|
||||
python -m scripts.chat_web
|
||||
|
|
@ -190,7 +190,7 @@ I've published a number of guides that might contain helpful information, most r
|
|||
|
||||
## Contributing
|
||||
|
||||
The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~3 hours, but by improving the pretraining stage we can improve this further.
|
||||
The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~1.5 hours (down from 3h), but by improving the pretraining stage we can improve this further.
|
||||
|
||||
Current AI policy: disclosure. When submitting a PR, please declare any parts that had substantial LLM contribution and that you have not written or that you do not fully understand.
|
||||
|
||||
|
|
|
|||
|
|
@ -26,7 +26,7 @@ python -m nanochat.dataset -n 8
|
|||
python -m scripts.tok_train --max-chars=2000000000
|
||||
python -m scripts.tok_eval
|
||||
|
||||
# train a small 4 layer model
|
||||
# train a small 6 layer model
|
||||
# I tuned this run to complete in about 30 minutes on my MacBook Pro M3 Max.
|
||||
# To get better results, try increasing num_iterations, or get other ideas from your favorite LLM.
|
||||
python -m scripts.base_train \
|
||||
|
|
|
|||
|
|
@ -1,11 +1,11 @@
|
|||
#!/bin/bash
|
||||
|
||||
# This script is configured to train your own GPT-2 grade LLM (pretraining + finetuning)
|
||||
# It is designed to run on a blank 8XH100 GPU node and takes approximately 3 hours to complete.
|
||||
# It is designed to run on a blank 8XH100 GPU node and takes approximately 1.5 hours to complete.
|
||||
|
||||
# 1) Example launch (simplest):
|
||||
# bash runs/speedrun.sh
|
||||
# 2) Example launch in a screen session (because the run takes ~3 hours):
|
||||
# 2) Example launch in a screen session (because the run takes ~1.5 hours):
|
||||
# screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
|
||||
# 3) Example launch with wandb logging, but see below for setting up wandb first:
|
||||
# WANDB_RUN=speedrun screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
|
||||
|
|
|
|||
|
|
@ -4,8 +4,8 @@ All the generic code lives here, and all the evaluation-specific
|
|||
code lives in nanochat directory and is imported from here.
|
||||
|
||||
Example runs:
|
||||
python -m scripts.chat_eval -a ARC-Easy
|
||||
torchrun --nproc_per_node=8 -m scripts.chat_eval -- -a ARC-Easy
|
||||
python -m scripts.chat_eval -i sft -a ARC-Easy
|
||||
torchrun --nproc_per_node=8 -m scripts.chat_eval -- -i sft -a ARC-Easy
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
|
@ -241,6 +241,7 @@ if __name__ == "__main__":
|
|||
centered_acc = (acc - baseline_acc) / (1.0 - baseline_acc)
|
||||
centered_mean += centered_acc
|
||||
chatcore_metric = centered_mean / len(results)
|
||||
print0(f"CORE score: {100 * chatcore_metric:.2f}%")
|
||||
chatcore_metric_dict = {"ChatCORE metric": chatcore_metric}
|
||||
get_report().log(section="Chat evaluation " + args.source, data=[
|
||||
vars(args), # CLI args
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user