nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-15 10:39:08 +00:00

History

ademeure 3d0dec5716 FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0 Attention priority: FA3 (Hopper) → FlexAttention (Blackwell/Ada) → SDPA. FlexAttention uses block-sparse sliding window via torch.compile, ~3x faster than SDPA dense masks for sliding window layers. Full causal always uses SDPA is_causal=True. Override with ATTENTION=fa3\|flex\|sdpa. Also upgrades PyTorch 2.9.1 → 2.11.0 with CUDA 13.0, and auto-detects GPU for PyTorch/CUDA version selection in pyproject.toml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-08 21:38:29 +00:00
..
__init__.py	initial commit	2025-10-13 06:49:24 -07:00
checkpoint_manager.py	tune the data mixture a bit, load optimizer by default when SFT. These were confirmed to be best settings from sweeps of sft	2026-02-18 15:49:18 +00:00
common.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
core_eval.py	initial commit	2025-10-13 06:49:24 -07:00
dataloader.py	big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise	2026-03-04 19:47:12 +00:00
dataset.py	big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise	2026-03-04 19:47:12 +00:00
engine.py	Autoresearch round 2: smear, backout, and hyperparameter tuning	2026-03-14 17:03:06 +00:00
execution.py	nit delete redundant catch/raise in execute	2025-10-29 08:10:03 -07:00
flash_attention.py	FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0	2026-04-08 21:38:29 +00:00
fp8.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
gpt.py	Autoresearch round 2: smear, backout, and hyperparameter tuning	2026-03-14 17:03:06 +00:00
logo.svg	initial commit	2025-10-13 06:49:24 -07:00
loss_eval.py	fix typos	2025-11-14 11:20:25 +01:00
optim.py	use COMPUTE_DTYPE-aware cast in Muon polar express step	2026-03-25 20:19:14 +00:00
report.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
tokenizer.py	adjust the comment on the regex pattern per recent experimnet see dev/LOG.md	2026-01-13 17:50:39 +00:00
ui.html	Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348 )	2025-12-31 13:03:22 -08:00