nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-18 20:19:08 +00:00

Author	SHA1	Message	Date
Caleb DeLeeuw	13b97b6088	Merge branch 'karpathy:master' into claude/nanochat-sae-interpretability-011CUT2TocZpFerXthoW9LMf	2025-10-25 19:11:19 -07:00
Andrej Karpathy	c75fe54aa7	readme tweak, link to new discussion and add file structure	2025-10-25 19:39:16 +00:00
Claude	558e949ddd	Add SAE-based interpretability extension for nanochat This commit adds a complete Sparse Autoencoder (SAE) based interpretability extension to nanochat, enabling mechanistic understanding of learned features at runtime and during training. ## Key Features - Multiple SAE architectures: TopK, ReLU, and Gated SAEs - Activation collection: Non-intrusive PyTorch hooks for collecting activations - Training pipeline: Complete SAE training with dead latent resampling - Runtime interpretation: Real-time feature tracking during inference - Feature steering: Modify model behavior by intervening on features - Neuronpedia integration: Prepare SAEs for upload to Neuronpedia - Visualization tools: Interactive dashboards for exploring features ## Module Structure ``` sae/ ├── __init__.py # Package exports ├── config.py # SAE configuration dataclass ├── models.py # TopK, ReLU, Gated SAE implementations ├── hooks.py # Activation collection via PyTorch hooks ├── trainer.py # SAE training loop and evaluation ├── runtime.py # Real-time interpretation wrapper ├── evaluator.py # SAE quality metrics ├── feature_viz.py # Feature visualization tools └── neuronpedia.py # Neuronpedia API integration scripts/ ├── sae_train.py # Train SAEs on nanochat activations ├── sae_eval.py # Evaluate trained SAEs └── sae_viz.py # Visualize SAE features tests/ └── test_sae.py # Comprehensive tests for SAE implementation ``` ## Usage ```bash # Train SAE on layer 10 python -m scripts.sae_train --checkpoint models/d20/base_final.pt --layer 10 # Evaluate SAE python -m scripts.sae_eval --sae_path sae_models/layer_10/best_model.pt # Visualize features python -m scripts.sae_viz --sae_path sae_models/layer_10/best_model.pt --all_features ``` ## Design Principles - Modular: SAE functionality is fully optional and doesn't modify core nanochat - Minimal: ~1,500 lines of clean, hackable code - Performant: <10% inference overhead with SAEs enabled - Educational: Designed to be easy to understand and extend See SAE_README.md for complete documentation and examples. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-25 01:22:51 +00:00
Andrej Karpathy	05a051dbe9	fix tokenization bug, there should be no space before first letter. sigh	2025-10-24 15:06:06 +00:00
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Andrej Karpathy	81597cd616	move the lr schedule args up in base_train so they are tunable in configurator	2025-10-24 13:27:31 +00:00
Andrej Karpathy	cc3636b01c	allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough	2025-10-24 13:27:05 +00:00
Andrej Karpathy	5eeb2b6ef9	experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme	2025-10-22 16:55:54 +00:00
Andrej Karpathy	2dda5c4c8d	Merge branch 'ulanch-fix/ios-safari-input-overlap'	2025-10-22 16:26:35 +00:00
Andrej Karpathy	80b203ea59	also bump run1000.sh to new uv sync	2025-10-22 16:25:36 +00:00
Luke Stanley	917c858136	Updates lockfile with CPU package support without overwriting other architectures	2025-10-22 16:25:36 +00:00
Luke Stanley	db1d5b595d	Git ignore eval_bundle	2025-10-22 16:25:36 +00:00
Luke Stanley	dd9387b362	Fix GPU-less CPU use on Linux with specific Torch indexes	2025-10-22 16:25:36 +00:00
Luke Stanley	32571664b1	Fix Torch crash caused by pinning on CPU	2025-10-22 16:25:36 +00:00
Andrej Karpathy	51e70f0d3c	Merge branch 'lukestanley-fix-cpu-support-with-extras'	2025-10-22 16:11:15 +00:00
Andrej Karpathy	48387cd895	also bump run1000.sh to new uv sync	2025-10-22 16:08:31 +00:00
ulanch	796f84527f	fix(ui): prevent iOS Safari toolbar from covering input on initial load	2025-10-21 17:34:40 -07:00
Luke Stanley	7a52f9bfbb	Updates lockfile with CPU package support without overwriting other architectures	2025-10-21 23:14:34 +00:00
Luke Stanley	760af62e11	Git ignore eval_bundle	2025-10-21 23:14:34 +00:00
Luke Stanley	901b075605	Fix GPU-less CPU use on Linux with specific Torch indexes	2025-10-21 23:14:16 +00:00
Luke Stanley	defd1246aa	Fix Torch crash caused by pinning on CPU	2025-10-21 20:28:10 +00:00
Andrej	2e938530ce	delete spurious torch.empty allocation in adamw fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-21 11:35:17 -07:00
Andrej Karpathy	a088b7a6ec	use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available	2025-10-21 18:07:33 +00:00
Andrej Karpathy	94ee507054	quick fix base eval due to fewshot requirement	2025-10-21 17:56:08 +00:00
Andrej	33e8a27f91	Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something. add cpu\|mps support	2025-10-21 10:26:04 -07:00
Andrej Karpathy	50bea28ef9	also add readme mention of the cpu mps changes	2025-10-21 17:24:48 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00
Andrej Karpathy	dfcb1c16f1	Merge branch 'master' into cpu-mps-dev	2025-10-21 17:15:53 +00:00
Andrej Karpathy	bb71c64579	fix silly issue in dataloader, this version is much faster and more portable to mps too	2025-10-21 17:12:50 +00:00
karpathy	bb786c5560	i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...	2025-10-21 10:07:40 -07:00
Andrej	c9ea7a91e2	Add customization instructions to README Added a section on customization for nanochat.	2025-10-21 08:57:10 -07:00
Andrej Karpathy	03cddd9878	actually let's not brick code on git pull. change error to warning	2025-10-21 15:13:25 +00:00
Andrej Karpathy	fe5aed940b	add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok	2025-10-21 15:04:58 +00:00
karpathy	2e9669e03a	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
Andrej	a09ac812ed	toml changes for cpu only install	2025-10-20 07:53:15 -07:00
Sermet Pekin	49cd02f283	fix: remove unnecessary tensor allocation in DistAdamW optimizer fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-20 12:03:26 +03:00
burtenshaw	0abb0fa2e3	add both sides of the source check	2025-10-20 10:44:07 +02:00
burtenshaw	c7ae920a77	add check for linux on cpu	2025-10-20 06:51:52 +02:00
Andrej	0f007889dd	Add MIT License as a file to the project	2025-10-19 17:22:19 -07:00
Andrej	5a879f4947	export NANOCHAT_BASE_DIR so child processes get it too	2025-10-19 17:07:56 -07:00
Andrej Karpathy	c1d2ed1c13	use orig_model in sampling, silly of me to miss this	2025-10-20 00:05:09 +00:00
Andrej Karpathy	2bc521a6de	use orig_model in sampling, silly of me to miss this	2025-10-20 00:04:15 +00:00
Andrej Karpathy	9467d83cf2	fix memory leak bug in rust tokenizer ty @mitsuhiko	2025-10-19 23:54:31 +00:00
Tancrède Lepoint	b1443dc98c	export NANOCHAT_BASE_DIR so child processes get it too	2025-10-19 14:05:40 -04:00
Andrej	cf2baf9933	fix typo Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>	2025-10-17 08:35:41 -07:00
karpathy	e4f9b9c64d	revert to previous pyproject.toml	2025-10-17 08:08:16 -07:00
Andrej	e883b1d597	Merge pull request #99 from burtenshaw/cpu-mps-dev-ben Add mps and cpu dependency management	2025-10-17 07:24:38 -07:00
burtenshaw	23b6351c1c	add groups and source selection	2025-10-17 12:20:18 +02:00
karpathy	ae02650afe	update the midtraining script too	2025-10-16 16:33:17 -07:00
karpathy	df600b6ed5	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00

1 2

76 Commits