nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-10 09:50:25 +00:00

History

Matt Langston 75bd386b8e add Flash Attention 2 as a middle tier between FA3 and SDPA on sm80+ non-Hopper GPUs (Blackwell, Ada, Ampere) with the flash-attn package installed, FA2 kernels replace the SDPA fallback. priority is FA3 > FA2 > SDPA. measured 28% faster than SDPA on GB10, and makes sliding-window attention fast on Blackwell (where FA3 is unavailable). no effect on H100: USE_FA3 wins whenever available so runs/speedrun.sh on 8xH100 runs the same kernels as before. tests/test_attention_fallback.py::TestFA2VsSDPA compares FA2 and SDPA output on any sm80+ GPU with flash-attn installed. context: https://github.com/karpathy/nanochat/discussions/710 (the writeup was produced from my dgx-spark branch at https://github.com/matt-langston/nanochat/tree/dgx-spark, which carries these two PRs plus a DGX-Spark-Bundle-specific speedrun script I kept separate) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:51:22 -07:00
..
test_attention_fallback.py	add Flash Attention 2 as a middle tier between FA3 and SDPA	2026-04-17 19:51:22 -07:00
test_engine.py	Fix MockModel's device definition (#535 )	2026-02-17 16:03:46 -08:00

add Flash Attention 2 as a middle tier between FA3 and SDPA

on sm80+ non-Hopper GPUs (Blackwell, Ada, Ampere) with the flash-attn package installed, FA2 kernels replace the SDPA fallback. priority is FA3 > FA2 > SDPA. measured 28% faster than SDPA on GB10, and makes sliding-window attention fast on Blackwell (where FA3 is unavailable). no effect on H100: USE_FA3 wins whenever available so runs/speedrun.sh on 8xH100 runs the same kernels as before. tests/test_attention_fallback.py::TestFA2VsSDPA compares FA2 and SDPA output on any sm80+ GPU with flash-attn installed.

context: https://github.com/karpathy/nanochat/discussions/710 (the writeup was produced from my dgx-spark branch at https://github.com/matt-langston/nanochat/tree/dgx-spark, which carries these two PRs plus a DGX-Spark-Bundle-specific speedrun script I kept separate)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-17 19:51:22 -07:00

test_attention_fallback.py

add Flash Attention 2 as a middle tier between FA3 and SDPA

2026-04-17 19:51:22 -07:00

test_engine.py

Fix MockModel's device definition (#535 )

2026-02-17 16:03:46 -08:00