nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-19 03:13:15 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	da8b7ea4cb	also delete the rustbpe test code, this now lives in rustbpe repo that is separate	2026-01-04 01:23:34 +00:00
Andrej Karpathy	aa42f40e66	delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking	2026-01-03 23:55:28 +00:00
Andrej Karpathy	48abd7d85f	simplify, clarify and slightly tune model initialization. should be very slightly better possibly, but certainly a lot clearer	2026-01-01 21:15:09 +00:00
Paweł Krefta	10231dfb40	Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348 )	2025-12-31 13:03:22 -08:00
helloaidank	389d019a0b	small change to doc string at top of tok_train.py (#402 )	2025-12-31 12:57:26 -08:00
Hossein-Lakzaei	8c89661465	Update README to match current d34 demo (#314 ) (#381 ) * Update README: switch hosted model description from d32 to d34 per discussion #314 * link to discussion thread * parameter in quotes --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2025-12-30 10:17:11 +01:00
Andrej Karpathy	8f979a8bda	fix: sample first token independently for each row in multi-sample generation Previously, when generating multiple samples (num_samples > 1), the first token after prefill was sampled once and broadcast to all rows, causing all samples to start identically. Now the prefill logits are expanded to num_samples and sampled independently for each row. Also simplified the generation loop by moving the forward pass to the end of the loop, eliminating the first_iteration flag and if/else branching. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 04:52:13 +00:00
Dipesh Babu	2f2d7ab80c	fix: safe DDP cleanup (check initialized PG, not just env) (#256 )	2025-12-27 20:27:40 -08:00
Andrej Karpathy	91d76cc690	Replace speedup assertion with warning in batch_encode test Performance varies by machine and load, making hard assertions flaky. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 04:10:49 +00:00
Andrej	7a8769a40c	Merge pull request #383 from barisozmen/master 3x faster rust encode (`batch_encode`) (12 LoC + 2 tests)	2025-12-27 20:06:57 -08:00
Andrej	088726aa7d	clean up model_tag handling across scripts a bit more.	2025-12-27 20:01:09 -08:00
Andrej Karpathy	2874eda59a	update to new os env var to get rid of deprecation warning	2025-12-28 03:32:46 +00:00
Andrej Karpathy	e1770a3061	remove spurious cast, gets compiled away anyway but it's confusing people	2025-12-27 23:07:48 +00:00
Andrej Karpathy	49389ecaa8	fix tf32 warning for deprecated api use	2025-12-27 22:03:06 +00:00
DU Wenjie	ea4229851b	bugfix	2025-12-26 19:02:12 +08:00
DU Wenjie	7840049189	bugfix keep same args style in scripts/base_eval.py	2025-12-26 17:29:08 +08:00
Andrej	bc51da8bac	pad vocab size to 64 for DDP optimizers and efficiency	2025-12-23 09:13:31 -08:00
duwenjie	92c6654b95	bugfix save and load ckpt from model_tag dir	2025-12-21 15:07:04 +08:00
Barış Özmen	790f3be65c	add rust batch encode as a faster option over encode	2025-12-18 19:17:59 +03:00
Matěj Kripner	d314e96aa2	formatting	2025-12-09 12:48:46 +01:00
Matěj Kripner	bbc57da7d5	slightly nicer error message	2025-12-09 12:46:48 +01:00
Matěj Kripner	f1bf69d562	feat: pad vocab size to 64 for DDP optimizers and efficiency	2025-12-09 12:38:18 +01:00
Andrej	d5759400f9	fixing two typos in comments	2025-12-08 20:03:08 -08:00
Andrej	e72c3299df	fix random.seed() footgun bug for SpellingBee data generation	2025-12-08 19:58:45 -08:00
Andrej	7931e0903a	rename checkpoint_dir to checkpoints_dir for consistency.	2025-12-08 18:32:12 -08:00
Andrej	849d95ae1f	remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer	2025-12-08 18:30:37 -08:00
Andrej	39cccc527f	small bugfix make mid_train script work even with a tiny number of iterations	2025-12-08 18:27:32 -08:00
Andrej	8b1cecaa95	Apply suggestion from @svlandeg for nicer looking comparison Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2025-12-08 18:27:06 -08:00
Andrej	58f3e84e01	clean up train/val loader in sft for consistency with mid/base	2025-12-08 18:23:57 -08:00
Andrej	1b2a675c88	Improve KV cache code readability	2025-12-08 18:19:05 -08:00
Andrej	d75e6ed711	Fix script comment to reference correct file	2025-12-08 18:16:42 -08:00
Andrej	72a7cf2bc4	Fix distributed Parquet dataloader resume for multi-epoch training	2025-12-08 18:15:02 -08:00
Andrej Karpathy	bffdb2ef91	group common code to make things neater in gpt logit computation	2025-12-09 02:01:05 +00:00
Andrej	cbf30c842c	apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.	2025-12-08 14:17:43 -08:00
Andrej Karpathy	90442de35f	fix bug where any rank has to be able to create checkpoint_dir if saving optim	2025-12-08 20:45:19 +00:00
Andrej	2fd0440355	fix: missing val_bpb on resume	2025-12-08 12:35:08 -08:00
sunyujun03	01ea71be39	Fix distributed Parquet dataloader resume for multi-epoch training	2025-12-08 00:10:19 -06:00
KimYeongHyeon	a8847a0f83	Fix script comment to reference correct file	2025-12-02 10:46:20 +09:00
deepbuilder	06677c30e0	Refactor dimension validation for KV cache	2025-11-28 15:22:18 -05:00
deepbuilder	a770dcef2e	Fix kv_cache indexing to explicitly include head dimension	2025-11-28 15:00:14 -05:00
spjosyula	16788eed3c	fix(model): apply float32 cast before logits softcapping This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision	2025-11-23 20:12:09 +05:30
Sanzo00	53b3a4fb81	fix: missing val_bpb on resume	2025-11-22 11:04:20 +08:00
svlandeg	4bcc3bb698	clarify comment	2025-11-21 13:19:45 +01:00
Eric Silberstein	f37d45c21f	remove unneeded iter()	2025-11-20 15:14:56 -05:00
Eric Silberstein	5c93a56be5	remove unnecessary check	2025-11-19 16:31:41 -05:00
Eric Silberstein	dddb95caac	make mid_train script work even with a tiny number of iterations	2025-11-19 15:52:20 -05:00
Eric Silberstein	a4a0959c73	renamed find_largest_model() argument checkpoint_dir to checkpoints_dir for clarity	2025-11-19 15:33:36 -05:00
Eric Silberstein	024781f9df	fixing two typos in comments	2025-11-19 15:12:53 -05:00
Eric Silberstein	97770700f2	change test/train split approach because random.seed(1) and random.seed(-1) do the same thing	2025-11-19 14:51:02 -05:00
Andrej	4a87a0d19f	Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup Fix comment: rotary embeddings final dimension size	2025-11-17 13:29:21 -08:00

1 2 3 4

193 Commits