nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-10 19:25:31 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	eec0c79563	also add matplotlib dep so that we can have jupyter notebooks	2026-01-05 18:41:09 +00:00
Andrej Karpathy	54e59c38ad	add notebook on deriving the CORE estimates for the GPT-3 miniseries.	2026-01-05 18:40:28 +00:00
Andrej Karpathy	9d4c9b786d	many small fixes to base_train: reporting ETA, allowing some additional kwarg flexibility, making sure we don't crash when e.g. depth = 11 - we now calculate the closest num_heads that works	2026-01-05 00:38:09 +00:00
Andrej Karpathy	962b6bfba3	alright add transformers as a dep of the repo because it should be easy to evaluate the CORE score of HF models. Not super happy about it but i tried it and the uv.lock doesn't get bloated as much as i expected	2026-01-04 20:37:28 +00:00
Andrej Karpathy	ed2082fbc4	sane secrets management	2026-01-04 19:29:22 +00:00
Andrej Karpathy	eb7bbc1b66	delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts	2026-01-04 19:14:23 +00:00
Andrej Karpathy	507d54224a	fix small bug where this would break if git stage has deleted files	2026-01-04 19:11:43 +00:00
Andrej Karpathy	9c60dfb64c	bump nanochat to use the latest stable pytorch that is 2.9.1 . Run e.g. to re-update your local environment if you git pull	2026-01-04 18:36:36 +00:00
Andrej Karpathy	be56d29b87	simplify redundant if/elif in bloat metrics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 01:40:42 +00:00
Andrej Karpathy	ee79f29fbd	replace files-to-prompt with git ls-files for bloat metrics files-to-prompt was including untracked files (knowledge/, dev scripts, etc.) which inflated the bloat metrics. now we use git ls-files to only count tracked source files, which is more accurate and removes an external dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 01:38:15 +00:00
Andrej Karpathy	da8b7ea4cb	also delete the rustbpe test code, this now lives in rustbpe repo that is separate	2026-01-04 01:23:34 +00:00
Andrej Karpathy	aa42f40e66	delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking	2026-01-03 23:55:28 +00:00
Andrej Karpathy	48abd7d85f	simplify, clarify and slightly tune model initialization. should be very slightly better possibly, but certainly a lot clearer	2026-01-01 21:15:09 +00:00
Paweł Krefta	10231dfb40	Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348 )	2025-12-31 13:03:22 -08:00
helloaidank	389d019a0b	small change to doc string at top of tok_train.py (#402 )	2025-12-31 12:57:26 -08:00
Hossein-Lakzaei	8c89661465	Update README to match current d34 demo (#314 ) (#381 ) * Update README: switch hosted model description from d32 to d34 per discussion #314 * link to discussion thread * parameter in quotes --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2025-12-30 10:17:11 +01:00
Andrej Karpathy	8f979a8bda	fix: sample first token independently for each row in multi-sample generation Previously, when generating multiple samples (num_samples > 1), the first token after prefill was sampled once and broadcast to all rows, causing all samples to start identically. Now the prefill logits are expanded to num_samples and sampled independently for each row. Also simplified the generation loop by moving the forward pass to the end of the loop, eliminating the first_iteration flag and if/else branching. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 04:52:13 +00:00
Dipesh Babu	2f2d7ab80c	fix: safe DDP cleanup (check initialized PG, not just env) (#256 )	2025-12-27 20:27:40 -08:00
Andrej Karpathy	91d76cc690	Replace speedup assertion with warning in batch_encode test Performance varies by machine and load, making hard assertions flaky. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 04:10:49 +00:00
Andrej	7a8769a40c	Merge pull request #383 from barisozmen/master 3x faster rust encode (`batch_encode`) (12 LoC + 2 tests)	2025-12-27 20:06:57 -08:00
Andrej	088726aa7d	clean up model_tag handling across scripts a bit more.	2025-12-27 20:01:09 -08:00
Andrej Karpathy	2874eda59a	update to new os env var to get rid of deprecation warning	2025-12-28 03:32:46 +00:00
Andrej Karpathy	e1770a3061	remove spurious cast, gets compiled away anyway but it's confusing people	2025-12-27 23:07:48 +00:00
Andrej Karpathy	49389ecaa8	fix tf32 warning for deprecated api use	2025-12-27 22:03:06 +00:00
DU Wenjie	ea4229851b	bugfix	2025-12-26 19:02:12 +08:00
DU Wenjie	7840049189	bugfix keep same args style in scripts/base_eval.py	2025-12-26 17:29:08 +08:00
Andrej	bc51da8bac	pad vocab size to 64 for DDP optimizers and efficiency	2025-12-23 09:13:31 -08:00
duwenjie	92c6654b95	bugfix save and load ckpt from model_tag dir	2025-12-21 15:07:04 +08:00
Barış Özmen	790f3be65c	add rust batch encode as a faster option over encode	2025-12-18 19:17:59 +03:00
Matěj Kripner	d314e96aa2	formatting	2025-12-09 12:48:46 +01:00
Matěj Kripner	bbc57da7d5	slightly nicer error message	2025-12-09 12:46:48 +01:00
Matěj Kripner	f1bf69d562	feat: pad vocab size to 64 for DDP optimizers and efficiency	2025-12-09 12:38:18 +01:00
Andrej	d5759400f9	fixing two typos in comments	2025-12-08 20:03:08 -08:00
Andrej	e72c3299df	fix random.seed() footgun bug for SpellingBee data generation	2025-12-08 19:58:45 -08:00
Andrej	7931e0903a	rename checkpoint_dir to checkpoints_dir for consistency.	2025-12-08 18:32:12 -08:00
Andrej	849d95ae1f	remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer	2025-12-08 18:30:37 -08:00
Andrej	39cccc527f	small bugfix make mid_train script work even with a tiny number of iterations	2025-12-08 18:27:32 -08:00
Andrej	8b1cecaa95	Apply suggestion from @svlandeg for nicer looking comparison Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2025-12-08 18:27:06 -08:00
Andrej	58f3e84e01	clean up train/val loader in sft for consistency with mid/base	2025-12-08 18:23:57 -08:00
Andrej	1b2a675c88	Improve KV cache code readability	2025-12-08 18:19:05 -08:00
Andrej	d75e6ed711	Fix script comment to reference correct file	2025-12-08 18:16:42 -08:00
Andrej	72a7cf2bc4	Fix distributed Parquet dataloader resume for multi-epoch training	2025-12-08 18:15:02 -08:00
Andrej Karpathy	bffdb2ef91	group common code to make things neater in gpt logit computation	2025-12-09 02:01:05 +00:00
Andrej	cbf30c842c	apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.	2025-12-08 14:17:43 -08:00
Andrej Karpathy	90442de35f	fix bug where any rank has to be able to create checkpoint_dir if saving optim	2025-12-08 20:45:19 +00:00
Andrej	2fd0440355	fix: missing val_bpb on resume	2025-12-08 12:35:08 -08:00
sunyujun03	01ea71be39	Fix distributed Parquet dataloader resume for multi-epoch training	2025-12-08 00:10:19 -06:00
KimYeongHyeon	a8847a0f83	Fix script comment to reference correct file	2025-12-02 10:46:20 +09:00
deepbuilder	06677c30e0	Refactor dimension validation for KV cache	2025-11-28 15:22:18 -05:00
deepbuilder	a770dcef2e	Fix kv_cache indexing to explicitly include head dimension	2025-11-28 15:00:14 -05:00

1 2 3 4 5

203 Commits