nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-16 02:59:10 +00:00

History

cc 8fc2829db5 Add Flash Attention 4 (FA4) support for Blackwell GPUs Add FA4 as the top-priority attention backend, enabling Blackwell (sm100) and Hopper (sm90) GPUs to use flash-attn-4 CuTeDSL kernels instead of falling back to PyTorch SDPA. Detection priority: FA4 > FA3 > SDPA (backwards-compatible, no config needed). Tested on 2x NVIDIA B200 with depth 20 + FP8: val BPB 0.833, ~31% bf16 MFU.		2026-03-08 17:06:10 -04:00
..
base_eval.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
base_train.py	Add Flash Attention 4 (FA4) support for Blackwell GPUs	2026-03-08 17:06:10 -04:00
chat_cli.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_eval.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_rl.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_sft.py	Add Flash Attention 4 (FA4) support for Blackwell GPUs	2026-03-08 17:06:10 -04:00
chat_web.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00