nanochat/LOCAL_STATE.md
ademeure 322eb6b86b Add profiling infrastructure (env-var controlled, nsys/ncu/torch profiler)
- base_train.py: CUDA profiler + PyTorch profiler hooks gated by NANOCHAT_PROFILE_* env vars
- profile_step.py: standalone single-step profiler with NVTX ranges and phase selection
- LOCAL_STATE.md: documents local branch/file state before machine teardown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:29:04 +00:00

36 lines
1.6 KiB
Markdown

# Local State — nanochat (karpathy fork)
Documented 2026-04-09 before machine teardown.
## Branch: fa3-flex-sdpa (current)
- Tracking: `fork/fa3-flex-sdpa` (ademeure/nanochat) — pushed and up to date
- 1 commit ahead of upstream master: `3d0dec5 FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0`
## Branch: pytorch-2.11-cu130
- Tracking: `fork/pytorch-2.11-cu130` — pushed and up to date
- 2 commits ahead of master
## Branch: pytorch-2.11-cu128-test
- **Local-only, no upstream** — but 0 commits ahead of master, just a branch pointer. No unique content.
## Uncommitted changes (being committed now)
### scripts/base_train.py
- Added env-var-controlled profiling hooks (`NANOCHAT_PROFILE_START`, `NANOCHAT_PROFILE_STOP`, `NANOCHAT_PROFILE_EXIT`, `NANOCHAT_TORCH_PROFILE_DIR`)
- CUDA profiler start/stop integration around training steps
- PyTorch profiler with tensorboard trace output
- Early exit after profiling completes
- This is a work-in-progress profiling integration — functional but may need further tuning
### scripts/profile_step.py (new file)
- Standalone profiling script for a single training step (fwd/bwd/opt)
- Supports nsys and ncu profiling with NVTX ranges
- Usage: `nsys profile -o out python -m scripts.profile_step --depth 6`
- Supports `--phase {all,fwd,bwd,opt}` for targeted kernel analysis
### profiles/ (NOT committed — binary nsys artifacts)
- `nsys_d32_full.nsys-rep` (1.6M) — nsys trace, depth=32
- `nsys_d32_full.sqlite` (2.4M) — exported sqlite
- `nsys_d32_minimal.nsys-rep` (1.5M) — minimal nsys trace
- These are reproducible output artifacts, not committed to git