mirror of
https://github.com/karpathy/nanochat.git
synced 2026-02-10 12:39:50 +00:00
Merge b0ab30563e into 230d6cf6c6
This commit is contained in:
commit
486038d4dd
|
|
@ -10,7 +10,7 @@ For questions about the repo, I recommend either using [DeepWiki](https://deepwi
|
|||
## Updates
|
||||
|
||||
- (Jan 31 2026) Major revamp of all scripts/README ongoing, deleting midtraining stage, might be a bit messy briefly...
|
||||
- (Jan 30 2026) With all the latest improvements we're able to train GPT-2 grade LLM in about $73. The [runs/speedrun.sh](runs/speedrun.sh) script will become the refernece way to train GPT-2 grade model and talk to it.
|
||||
- (Jan 30 2026) With all the latest improvements we're able to train GPT-2 grade LLM in about $73. The [runs/speedrun.sh](runs/speedrun.sh) script will become the reference way to train GPT-2 grade model and talk to it.
|
||||
|
||||
## Leaderboard
|
||||
|
||||
|
|
@ -50,13 +50,13 @@ The GPT-2 CORE score (i.e. the target to beat) is 0.256525. So we see that this
|
|||
|
||||
### Reproduce and talk to GPT-2
|
||||
|
||||
The most fun you can have is to train your own GPT-2 and talk to it. The entire pipeline to do so is contained in the single file [runs/speedrun.sh](runs/speedrun.sh), which is designed to be run on an 8XH100 GPU node. Currently, at ~$24/hour for these nodes, pretraining GPT-2 grade model takes approximately 3 hours and will set you back about $75. Boot up a new 8XH100 GPU box from your favorite provider (e.g. I use and like [Lambda](https://lambda.ai/service/gpu-cloud)), and kick off the training script:
|
||||
The most fun you can have is to train your own GPT-2 and talk to it. The entire pipeline to do so is contained in the single file [runs/speedrun.sh](runs/speedrun.sh), which is designed to be run on an 8XH100 GPU node. Currently, at ~$24/hour for these nodes, pretraining a GPT-2 grade model takes approximately 3 hours and will set you back about $75. Boot up a new 8XH100 GPU box from your favorite provider (e.g. I use and like [Lambda](https://lambda.ai/service/gpu-cloud)), and kick off the training script:
|
||||
|
||||
```bash
|
||||
bash runs/speedrun.sh
|
||||
```
|
||||
|
||||
You mish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
|
||||
You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
|
||||
|
||||
```bash
|
||||
python -m scripts.chat_web
|
||||
|
|
@ -97,7 +97,7 @@ The overall approach is to treat the depth of the model as the single dial of co
|
|||
|
||||
## Running on CPU / MPS
|
||||
|
||||
The script [runs/runcpu.sh](runs/runcpu.sh) shows a very simple example of running on CPU or Apple Silicon. It dramatically shrinks the LLM tha tis being trained to make things fit into a reasonable time interval of a few ten minutes of training. You will not get strong results in this way.
|
||||
The script [runs/runcpu.sh](runs/runcpu.sh) shows a very simple example of running on CPU or Apple Silicon. It dramatically shrinks the LLM that is being trained to make things fit into a reasonable time interval of a few ten minutes of training. You will not get strong results in this way.
|
||||
|
||||
## Guides
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user