Pre-GPU Runbook

This runbook is the minimum operational checklist before spending GPU time.

1. Local Prep

python -m scripts.build_tool_datasets

python -m scripts.import_hf_checkpoint \
  --repo-id ManmohanSharma/nanochat-d24 \
  --model-tag d24_hf_import

python -m pytest tests/test_engine.py tests/test_tools.py -v

python -m scripts.chat_eval \
  -i sft \
  -a ToolJSON \
  --tool-jsonl seed_data/tool_eval_seed.jsonl \
  --device-type cpu \
  -x 3

Pilot CPT
- Run a short continuation test from the imported base checkpoint.
- Confirm loss is moving, checkpoint save works, and HF sync works.
Full CPT
- Run the main continuation stage on ClimbMix backbone.
- Save staged checkpoints at planned intervals.
SFT
- Include the local tool SFT JSONL via --extra-train-jsonl.
- Validate that calculator/web_search traces render correctly.
RL / tool tuning
- Keep this stage narrow and short.
- Focus on tool-choice correctness and grounded answers.
Eval
- Run ARC, MMLU, GSM8K, HumanEval, and ToolJSON checks.
- Do not ship if tool behavior regresses or citations are missing.

Upload every stage boundary and any explicit resume point:

python -m scripts.hf_sync_checkpoint \
  --repo-id ManmohanSharma/nanochat-d24 \
  --source base \
  --model-tag d24_hf_import \
  --step 0

If a whole checkpoint directory should be mirrored:

python -m scripts.hf_sync_checkpoint \
  --repo-id ManmohanSharma/nanochat-d24 \
  --source base \
  --model-tag d24_hf_import

Go only if:

HF import works.
HF sync works.
Mock tool execution works.
Tool seed datasets are generated.
Tool eval runs locally.
The search backend plan is explicit: search provider plus Cloudflare fetch/crawl.

No-Go if:

Any tokenizer mismatch appears during HF import.
Tool blocks fail to render.
web_search still has no backend plan beyond fetch-only Cloudflare Browser Rendering.
Local tool eval is missing or failing.