nanochat/scripts
Abylay Ospan 50b236fbcc Add PyTorch and CUDA memory profiling systems
Capture PyTorch execution traces and CUDA memory snapshots.  Traces
display detailed CPU and CUDA activity, including individual CUDA kernel
calls.

CUDA memory snapshots visualize all memory allocations, helping diagnose
CUDA out-of-memory errors, investigate memory leaks, or understand GPU
memory usage for educational purposes.

Enable profiling with the --enable_profiling=True flag in speedrun.sh.

See PROFILING.md for documentation and example visualizations.
2025-10-18 12:50:19 +00:00
..
base_eval.py initial commit 2025-10-13 06:49:24 -07:00
base_loss.py initial commit 2025-10-13 06:49:24 -07:00
base_train.py Add PyTorch and CUDA memory profiling systems 2025-10-18 12:50:19 +00:00
chat_cli.py initial commit 2025-10-13 06:49:24 -07:00
chat_eval.py initial commit 2025-10-13 06:49:24 -07:00
chat_rl.py initial commit 2025-10-13 06:49:24 -07:00
chat_sft.py dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
chat_web.py also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
mid_train.py fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py initial commit 2025-10-13 06:49:24 -07:00