mirror of
https://github.com/karpathy/nanochat.git
synced 2026-04-03 22:25:27 +00:00
Add FA4 as the top-priority attention backend, enabling Blackwell (sm100) and Hopper (sm90) GPUs to use flash-attn-4 CuTeDSL kernels instead of falling back to PyTorch SDPA. Detection priority: FA4 > FA3 > SDPA (backwards-compatible, no config needed). Tested on 2x NVIDIA B200 with depth 20 + FP8: val BPB 0.833, ~31% bf16 MFU. |
||
|---|---|---|
| .. | ||
| base_eval.py | ||
| base_train.py | ||
| chat_cli.py | ||
| chat_eval.py | ||
| chat_rl.py | ||
| chat_sft.py | ||
| chat_web.py | ||
| tok_eval.py | ||
| tok_train.py | ||