Commit Graph

9 Commits

Author SHA1 Message Date
Barış Özmen
c7bc9000bf
Merge 7f6219e092 into b33e394528 2026-01-12 10:48:35 +08:00
Andrej Karpathy
2ff7d51252 integrate Flash Attention 3. +9% tok_per_sec for d12 with ctx even as low as 2048 out of the box nice. also, ready to tune windows huge 2026-01-11 20:33:19 +00:00
Sofie Van Landeghem
7f6219e092
Shorten assert msg 2025-12-31 14:02:19 +01:00
Barış Özmen
07d4bf7161
Rename test for clarity 2025-12-31 15:49:53 +03:00
Barış Özmen
57ffd35e0a
add test for seed variation in sampling
Add test for seed variation in sampling with temperature > 0.
2025-12-31 15:43:42 +03:00
Sofie Van Landeghem
31aeda19d1
Fix temperature test 2025-12-31 11:49:46 +01:00
Barış Özmen
bc81d6a460 test: add engine generation tests for expected invariants
- test_seed_reproducibility
- test_temperature_zero_determinism
- test_max_tokens_respected
- test_num_samples_count

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:41:04 +03:00
Andrej Karpathy
8f979a8bda fix: sample first token independently for each row in multi-sample generation
Previously, when generating multiple samples (num_samples > 1), the first
token after prefill was sampled once and broadcast to all rows, causing
all samples to start identically. Now the prefill logits are expanded to
num_samples and sampled independently for each row.

Also simplified the generation loop by moving the forward pass to the end
of the loop, eliminating the first_iteration flag and if/else branching.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 04:52:13 +00:00
Andrej Karpathy
baf0b3fdda also add a test that failed before the fix and passes now with the fix for kv cache resize 2025-10-28 16:54:17 +00:00