nanochat/LOCAL_STATE.md
ademeure 322eb6b86b Add profiling infrastructure (env-var controlled, nsys/ncu/torch profiler)
- base_train.py: CUDA profiler + PyTorch profiler hooks gated by NANOCHAT_PROFILE_* env vars
- profile_step.py: standalone single-step profiler with NVTX ranges and phase selection
- LOCAL_STATE.md: documents local branch/file state before machine teardown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:29:04 +00:00

1.6 KiB

Local State — nanochat (karpathy fork)

Documented 2026-04-09 before machine teardown.

Branch: fa3-flex-sdpa (current)

  • Tracking: fork/fa3-flex-sdpa (ademeure/nanochat) — pushed and up to date
  • 1 commit ahead of upstream master: 3d0dec5 FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0

Branch: pytorch-2.11-cu130

  • Tracking: fork/pytorch-2.11-cu130 — pushed and up to date
  • 2 commits ahead of master

Branch: pytorch-2.11-cu128-test

  • Local-only, no upstream — but 0 commits ahead of master, just a branch pointer. No unique content.

Uncommitted changes (being committed now)

scripts/base_train.py

  • Added env-var-controlled profiling hooks (NANOCHAT_PROFILE_START, NANOCHAT_PROFILE_STOP, NANOCHAT_PROFILE_EXIT, NANOCHAT_TORCH_PROFILE_DIR)
  • CUDA profiler start/stop integration around training steps
  • PyTorch profiler with tensorboard trace output
  • Early exit after profiling completes
  • This is a work-in-progress profiling integration — functional but may need further tuning

scripts/profile_step.py (new file)

  • Standalone profiling script for a single training step (fwd/bwd/opt)
  • Supports nsys and ncu profiling with NVTX ranges
  • Usage: nsys profile -o out python -m scripts.profile_step --depth 6
  • Supports --phase {all,fwd,bwd,opt} for targeted kernel analysis

profiles/ (NOT committed — binary nsys artifacts)

  • nsys_d32_full.nsys-rep (1.6M) — nsys trace, depth=32
  • nsys_d32_full.sqlite (2.4M) — exported sqlite
  • nsys_d32_minimal.nsys-rep (1.5M) — minimal nsys trace
  • These are reproducible output artifacts, not committed to git