mirror of
https://github.com/karpathy/nanochat.git
synced 2026-06-15 10:39:08 +00:00
- base_train.py: CUDA profiler + PyTorch profiler hooks gated by NANOCHAT_PROFILE_* env vars - profile_step.py: standalone single-step profiler with NVTX ranges and phase selection - LOCAL_STATE.md: documents local branch/file state before machine teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.6 KiB
1.6 KiB
Local State — nanochat (karpathy fork)
Documented 2026-04-09 before machine teardown.
Branch: fa3-flex-sdpa (current)
- Tracking:
fork/fa3-flex-sdpa(ademeure/nanochat) — pushed and up to date - 1 commit ahead of upstream master:
3d0dec5 FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0
Branch: pytorch-2.11-cu130
- Tracking:
fork/pytorch-2.11-cu130— pushed and up to date - 2 commits ahead of master
Branch: pytorch-2.11-cu128-test
- Local-only, no upstream — but 0 commits ahead of master, just a branch pointer. No unique content.
Uncommitted changes (being committed now)
scripts/base_train.py
- Added env-var-controlled profiling hooks (
NANOCHAT_PROFILE_START,NANOCHAT_PROFILE_STOP,NANOCHAT_PROFILE_EXIT,NANOCHAT_TORCH_PROFILE_DIR) - CUDA profiler start/stop integration around training steps
- PyTorch profiler with tensorboard trace output
- Early exit after profiling completes
- This is a work-in-progress profiling integration — functional but may need further tuning
scripts/profile_step.py (new file)
- Standalone profiling script for a single training step (fwd/bwd/opt)
- Supports nsys and ncu profiling with NVTX ranges
- Usage:
nsys profile -o out python -m scripts.profile_step --depth 6 - Supports
--phase {all,fwd,bwd,opt}for targeted kernel analysis
profiles/ (NOT committed — binary nsys artifacts)
nsys_d32_full.nsys-rep(1.6M) — nsys trace, depth=32nsys_d32_full.sqlite(2.4M) — exported sqlitensys_d32_minimal.nsys-rep(1.5M) — minimal nsys trace- These are reproducible output artifacts, not committed to git