mirror of
https://github.com/karpathy/nanochat.git
synced 2026-06-15 10:39:08 +00:00
update numbers from 3h to 1.5h
This commit is contained in:
parent
a445144d39
commit
da137191be
|
|
@ -51,7 +51,7 @@ The most fun you can have is to train your own GPT-2 and talk to it. The entire
|
|||
bash runs/speedrun.sh
|
||||
```
|
||||
|
||||
You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
|
||||
You may wish to do so in a screen session as this will take ~1.5 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it:
|
||||
|
||||
```bash
|
||||
python -m scripts.chat_web
|
||||
|
|
@ -190,7 +190,7 @@ I've published a number of guides that might contain helpful information, most r
|
|||
|
||||
## Contributing
|
||||
|
||||
The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~3 hours, but by improving the pretraining stage we can improve this further.
|
||||
The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~1.5 hours (down from 3h), but by improving the pretraining stage we can improve this further.
|
||||
|
||||
Current AI policy: disclosure. When submitting a PR, please declare any parts that had substantial LLM contribution and that you have not written or that you do not fully understand.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,11 @@
|
|||
#!/bin/bash
|
||||
|
||||
# This script is configured to train your own GPT-2 grade LLM (pretraining + finetuning)
|
||||
# It is designed to run on a blank 8XH100 GPU node and takes approximately 3 hours to complete.
|
||||
# It is designed to run on a blank 8XH100 GPU node and takes approximately 1.5 hours to complete.
|
||||
|
||||
# 1) Example launch (simplest):
|
||||
# bash runs/speedrun.sh
|
||||
# 2) Example launch in a screen session (because the run takes ~3 hours):
|
||||
# 2) Example launch in a screen session (because the run takes ~1.5 hours):
|
||||
# screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
|
||||
# 3) Example launch with wandb logging, but see below for setting up wandb first:
|
||||
# WANDB_RUN=speedrun screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user