nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-15 18:49:10 +00:00

History

ademeure 3d0dec5716 FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0 Attention priority: FA3 (Hopper) → FlexAttention (Blackwell/Ada) → SDPA. FlexAttention uses block-sparse sliding window via torch.compile, ~3x faster than SDPA dense masks for sliding window layers. Full causal always uses SDPA is_causal=True. Override with ATTENTION=fa3\|flex\|sdpa. Also upgrades PyTorch 2.9.1 → 2.11.0 with CUDA 13.0, and auto-detects GPU for PyTorch/CUDA version selection in pyproject.toml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-08 21:38:29 +00:00
..
base_eval.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
base_train.py	FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0	2026-04-08 21:38:29 +00:00
chat_cli.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_eval.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_rl.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_sft.py	FA3/FlexAttention/SDPA attention + PyTorch 2.11/CUDA 13.0	2026-04-08 21:38:29 +00:00
chat_web.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	fix: correct minor typos in help text, README, and comments	2026-03-12 17:03:26 +08:00