Commit Graph

16 Commits

Author SHA1 Message Date
Andrej Karpathy
2ff7d51252 integrate Flash Attention 3. +9% tok_per_sec for d12 with ctx even as low as 2048 out of the box nice. also, ready to tune windows huge 2026-01-11 20:33:19 +00:00
Andrej Karpathy
ccf4b7f9bf nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script 2026-01-07 22:11:59 +00:00
Andrej Karpathy
eec0c79563 also add matplotlib dep so that we can have jupyter notebooks 2026-01-05 18:41:09 +00:00
Andrej Karpathy
962b6bfba3 alright add transformers as a dep of the repo because it should be easy to evaluate the CORE score of HF models. Not super happy about it but i tried it and the uv.lock doesn't get bloated as much as i expected 2026-01-04 20:37:28 +00:00
Andrej Karpathy
ed2082fbc4 sane secrets management 2026-01-04 19:29:22 +00:00
Andrej Karpathy
9c60dfb64c bump nanochat to use the latest stable pytorch that is 2.9.1 . Run e.g. to re-update your local environment if you git pull 2026-01-04 18:36:36 +00:00
Andrej Karpathy
ee79f29fbd replace files-to-prompt with git ls-files for bloat metrics
files-to-prompt was including untracked files (knowledge/, dev scripts, etc.) which inflated the bloat metrics. now we use git ls-files to only count tracked source files, which is more accurate and removes an external dependency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:38:15 +00:00
Andrej Karpathy
aa42f40e66 delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking 2026-01-03 23:55:28 +00:00
Ajeesh Sunil
5e0987a431 numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list 2025-10-28 20:05:38 +00:00
Luke Stanley
901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-21 23:14:16 +00:00
burtenshaw
0abb0fa2e3 add both sides of the source check 2025-10-20 10:44:07 +02:00
burtenshaw
c7ae920a77 add check for linux on cpu 2025-10-20 06:51:52 +02:00
karpathy
e4f9b9c64d revert to previous pyproject.toml 2025-10-17 08:08:16 -07:00
burtenshaw
23b6351c1c add groups and source selection 2025-10-17 12:20:18 +02:00
karpathy
306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00