diff --git a/README.md b/README.md index 483f3e38..610140cd 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ The most fun you can have is to train your own GPT-2 and talk to it. The entire bash runs/speedrun.sh ``` -You may wish to do so in a screen session as this will take ~3 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it: +You may wish to do so in a screen session as this will take ~1.5 hours to run. Once it's done, you can talk to it via the ChatGPT-like web UI. Make sure again that your local uv virtual environment is active (run `source .venv/bin/activate`), and serve it: ```bash python -m scripts.chat_web @@ -190,7 +190,7 @@ I've published a number of guides that might contain helpful information, most r ## Contributing -The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~3 hours, but by improving the pretraining stage we can improve this further. +The goal of nanochat is to improve the state of the art in micro models that are accessible to work with end to end on budgets of < $1000 dollars. Accessibility is about overall cost but also about cognitive complexity - nanochat is not an exhaustively configurable LLM "framework"; there are no giant configuration objects, model factories, or if-then-else monsters in the code base. It is a single, cohesive, minimal, readable, hackable, maximally-forkable "strong baseline" codebase designed to run start to end and produce a ChatGPT model you can talk to. Currently, the most interesting part personally is speeding up the latency to GPT-2 (i.e. getting a CORE score above 0.256525). Currently this takes ~1.5 hours (down from 3h), but by improving the pretraining stage we can improve this further. Current AI policy: disclosure. When submitting a PR, please declare any parts that had substantial LLM contribution and that you have not written or that you do not fully understand. diff --git a/runs/speedrun.sh b/runs/speedrun.sh index 48fcc68a..8c8cad29 100644 --- a/runs/speedrun.sh +++ b/runs/speedrun.sh @@ -1,11 +1,11 @@ #!/bin/bash # This script is configured to train your own GPT-2 grade LLM (pretraining + finetuning) -# It is designed to run on a blank 8XH100 GPU node and takes approximately 3 hours to complete. +# It is designed to run on a blank 8XH100 GPU node and takes approximately 1.5 hours to complete. # 1) Example launch (simplest): # bash runs/speedrun.sh -# 2) Example launch in a screen session (because the run takes ~3 hours): +# 2) Example launch in a screen session (because the run takes ~1.5 hours): # screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh # 3) Example launch with wandb logging, but see below for setting up wandb first: # WANDB_RUN=speedrun screen -L -Logfile runs/speedrun.log -S speedrun bash runs/speedrun.sh