nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-04-05 23:25:35 +00:00

History

Claude 558e949ddd Add SAE-based interpretability extension for nanochat This commit adds a complete Sparse Autoencoder (SAE) based interpretability extension to nanochat, enabling mechanistic understanding of learned features at runtime and during training. ## Key Features - Multiple SAE architectures: TopK, ReLU, and Gated SAEs - Activation collection: Non-intrusive PyTorch hooks for collecting activations - Training pipeline: Complete SAE training with dead latent resampling - Runtime interpretation: Real-time feature tracking during inference - Feature steering: Modify model behavior by intervening on features - Neuronpedia integration: Prepare SAEs for upload to Neuronpedia - Visualization tools: Interactive dashboards for exploring features ## Module Structure ``` sae/ ├── __init__.py # Package exports ├── config.py # SAE configuration dataclass ├── models.py # TopK, ReLU, Gated SAE implementations ├── hooks.py # Activation collection via PyTorch hooks ├── trainer.py # SAE training loop and evaluation ├── runtime.py # Real-time interpretation wrapper ├── evaluator.py # SAE quality metrics ├── feature_viz.py # Feature visualization tools └── neuronpedia.py # Neuronpedia API integration scripts/ ├── sae_train.py # Train SAEs on nanochat activations ├── sae_eval.py # Evaluate trained SAEs └── sae_viz.py # Visualize SAE features tests/ └── test_sae.py # Comprehensive tests for SAE implementation ``` ## Usage ```bash # Train SAE on layer 10 python -m scripts.sae_train --checkpoint models/d20/base_final.pt --layer 10 # Evaluate SAE python -m scripts.sae_eval --sae_path sae_models/layer_10/best_model.pt # Visualize features python -m scripts.sae_viz --sae_path sae_models/layer_10/best_model.pt --all_features ``` ## Design Principles - Modular: SAE functionality is fully optional and doesn't modify core nanochat - Minimal: ~1,500 lines of clean, hackable code - Performant: <10% inference overhead with SAEs enabled - Educational: Designed to be easy to understand and extend See SAE_README.md for complete documentation and examples. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>		2025-10-25 01:22:51 +00:00
..
base_eval.py	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00
base_loss.py	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00
base_train.py	move the lr schedule args up in base_train so they are tunable in configurator	2025-10-24 13:27:31 +00:00
chat_cli.py	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
chat_eval.py	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
chat_rl.py	initial commit	2025-10-13 06:49:24 -07:00
chat_sft.py	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
chat_web.py	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
mid_train.py	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
sae_eval.py	Add SAE-based interpretability extension for nanochat	2025-10-25 01:22:51 +00:00
sae_train.py	Add SAE-based interpretability extension for nanochat	2025-10-25 01:22:51 +00:00
sae_viz.py	Add SAE-based interpretability extension for nanochat	2025-10-25 01:22:51 +00:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	initial commit	2025-10-13 06:49:24 -07:00