mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-23 13:23:23 +00:00
2.9 KiB
2.9 KiB
CSC490 Part 2 — Ablation Study Setup & Run Guide
Environment Setup
# Install uv if needed
command -v uv &> /dev/null || curl -LsSf https://astral.sh/uv/install.sh | sh
# Install all dependencies (including modal as dev dep)
uv sync --dev
# Activate the venv
source .venv/bin/activate
# Build the rustbpe tokenizer (requires Rust)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
uv run maturin develop --release --manifest-path rustbpe/Cargo.toml
# If you have conda activated, unset it first:
# unset CONDA_PREFIX && uv run maturin develop --release --manifest-path rustbpe/Cargo.toml
Modal Setup (one-time)
# Authenticate with Modal (creates ~/.modal.toml)
uv run modal setup
# Create the secret with your API keys
uv run modal secret create nanochat-secrets \
WANDB_API_KEY=<your_wandb_key> \
HF_TOKEN=hf_<your_hf_token>
Running the Ablation Study
Full pipeline (first time — downloads data, trains tokenizer, runs all 3 ablations)
# --detach keeps the pipeline alive even if you close your terminal
uv run modal run --detach nanochat_modal.py::main
This runs all 5 stages server-side on Modal:
- Download 12 FineWeb-EDU shards (~20 min, CPU)
- Train BPE tokenizer (~5 min, A10G)
picochat-baseline— relu², RoPE 10K (~51 min, A10G)picochat-swiglu— SwiGLU, RoPE 10K (~55 min, A10G)picochat-mtp— relu², MTP 1-step, w=0.3 (~66 min, A10G)
Monitor progress: https://wandb.ai/yoyoliuuu/nanochat
Re-run individual ablations (data + tokenizer already on volume)
uv run modal run nanochat_modal.py::run_baseline
uv run modal run nanochat_modal.py::run_swiglu
uv run modal run nanochat_modal.py::run_mtp
uv run modal run nanochat_modal.py::run_rope500k # supplemental ablation
Ablation Configurations
| Run name | mlp_type | rope_base | num_mtp_steps | Role |
|---|---|---|---|---|
| picochat-baseline | relu2 | 10,000 | 0 | Baseline |
| picochat-swiglu | swiglu | 10,000 | 0 | Ablation A |
| picochat-mtp | relu2 | 10,000 | 1 | Ablation B |
| picochat-rope500k | relu2 | 500,000 | 0 | Supplemental |
All runs use: depth=8, n_embd=512, max_seq_len=512, device_batch_size=16, A10G GPU.
Cost Reference (A10G @ ~$1.10/hr)
| Stage | Duration | Cost/run | × 3 seeds |
|---|---|---|---|
| Data + tokenizer | ~25 min | ~$0.11 | one-time |
| picochat-baseline | ~51 min | ~$0.94 | ~$2.82 |
| picochat-swiglu | ~55 min | ~$1.00 | ~$3.00 |
| picochat-mtp | ~66 min | ~$1.21 | ~$3.63 |
| picochat-rope500k | ~51 min | ~$0.94 | ~$2.82 |
| Total (3 seeds each) | ~$12.38 |