Commit Graph

9 Commits

Author SHA1 Message Date
Andrej Karpathy
2ff7d51252 integrate Flash Attention 3. +9% tok_per_sec for d12 with ctx even as low as 2048 out of the box nice. also, ready to tune windows huge 2026-01-11 20:33:19 +00:00
Andrej Karpathy
da8b7ea4cb also delete the rustbpe test code, this now lives in rustbpe repo that is separate 2026-01-04 01:23:34 +00:00
Andrej Karpathy
8f979a8bda fix: sample first token independently for each row in multi-sample generation
Previously, when generating multiple samples (num_samples > 1), the first
token after prefill was sampled once and broadcast to all rows, causing
all samples to start identically. Now the prefill logits are expanded to
num_samples and sampled independently for each row.

Also simplified the generation loop by moving the forward pass to the end
of the loop, eliminating the first_iteration flag and if/else branching.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 04:52:13 +00:00
Andrej Karpathy
91d76cc690 Replace speedup assertion with warning in batch_encode test
Performance varies by machine and load, making hard assertions flaky.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 04:10:49 +00:00
Barış Özmen
790f3be65c add rust batch encode as a faster option over encode 2025-12-18 19:17:59 +03:00
svlandeg
2ce62ec076 ensure consistency of quotes within each statement 2025-11-03 21:52:02 +01:00
svlandeg
c72b8b2309 add explicit UTF-8 encoding 2025-11-03 21:27:12 +01:00
Andrej Karpathy
baf0b3fdda also add a test that failed before the fix and passes now with the fix for kv cache resize 2025-10-28 16:54:17 +00:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00