nanochat/nanochat
Andrej Karpathy 8f979a8bda fix: sample first token independently for each row in multi-sample generation
Previously, when generating multiple samples (num_samples > 1), the first
token after prefill was sampled once and broadcast to all rows, causing
all samples to start identically. Now the prefill logits are expanded to
num_samples and sampled independently for each row.

Also simplified the generation loop by moving the forward pass to the end
of the loop, eliminating the first_iteration flag and if/else branching.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 04:52:13 +00:00
..
__init__.py initial commit 2025-10-13 06:49:24 -07:00
adamw.py slightly nicer error message 2025-12-09 12:46:48 +01:00
checkpoint_manager.py rename checkpoint_dir to checkpoints_dir for consistency. 2025-12-08 18:32:12 -08:00
common.py fix: safe DDP cleanup (check initialized PG, not just env) (#256) 2025-12-27 20:27:40 -08:00
configurator.py initial commit 2025-10-13 06:49:24 -07:00
core_eval.py initial commit 2025-10-13 06:49:24 -07:00
dataloader.py feat: pad vocab size to 64 for DDP optimizers and efficiency 2025-12-09 12:38:18 +01:00
dataset.py initial commit 2025-10-13 06:49:24 -07:00
engine.py fix: sample first token independently for each row in multi-sample generation 2025-12-28 04:52:13 +00:00
execution.py nit delete redundant catch/raise in execute 2025-10-29 08:10:03 -07:00
gpt.py remove spurious cast, gets compiled away anyway but it's confusing people 2025-12-27 23:07:48 +00:00
logo.svg initial commit 2025-10-13 06:49:24 -07:00
loss_eval.py fix typos 2025-11-14 11:20:25 +01:00
muon.py initial commit 2025-10-13 06:49:24 -07:00
report.py ensure consistency of quotes within each statement 2025-11-03 21:52:02 +01:00
tokenizer.py allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
ui.html fix(ui): prevent iOS Safari toolbar from covering input on initial load 2025-10-21 17:34:40 -07:00