nanochat/scripts
cc 8fc2829db5 Add Flash Attention 4 (FA4) support for Blackwell GPUs
Add FA4 as the top-priority attention backend, enabling Blackwell (sm100)
and Hopper (sm90) GPUs to use flash-attn-4 CuTeDSL kernels instead of
falling back to PyTorch SDPA.

Detection priority: FA4 > FA3 > SDPA (backwards-compatible, no config needed).

Tested on 2x NVIDIA B200 with depth 20 + FP8: val BPB 0.833, ~31% bf16 MFU.
2026-03-08 17:06:10 -04:00
..
base_eval.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
base_train.py Add Flash Attention 4 (FA4) support for Blackwell GPUs 2026-03-08 17:06:10 -04:00
chat_cli.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_eval.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_rl.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_sft.py Add Flash Attention 4 (FA4) support for Blackwell GPUs 2026-03-08 17:06:10 -04:00
chat_web.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00