svlandeg
12839c11e3
update uv lock
2026-04-13 11:20:38 +02:00
Andrej Karpathy
a445144d39
create a group for dev dependencies, there is no need to install all this other stuff just for speedrun and it's exposing people to dependency chain attacks. we need to delete more dependencies. dependencies bad bad bad
2026-03-26 03:41:28 +00:00
Andrej Karpathy
03be953668
delete non-essential deps from legacy use
2026-03-26 03:41:28 +00:00
Andrej Karpathy
e569b59f92
delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
2026-02-10 18:46:39 +00:00
Andrej Karpathy
6079f78fc3
add fp8 training with torchao
2026-02-03 21:03:42 +00:00
Andrej Karpathy
7d1700c521
add zstd lib
2026-01-16 00:44:01 +00:00
Andrej Karpathy
2ff7d51252
integrate Flash Attention 3. +9% tok_per_sec for d12 with ctx even as low as 2048 out of the box nice. also, ready to tune windows huge
2026-01-11 20:33:19 +00:00
Andrej Karpathy
ccf4b7f9bf
nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script
2026-01-07 22:11:59 +00:00
Andrej Karpathy
eec0c79563
also add matplotlib dep so that we can have jupyter notebooks
2026-01-05 18:41:09 +00:00
Andrej Karpathy
962b6bfba3
alright add transformers as a dep of the repo because it should be easy to evaluate the CORE score of HF models. Not super happy about it but i tried it and the uv.lock doesn't get bloated as much as i expected
2026-01-04 20:37:28 +00:00
Andrej Karpathy
ed2082fbc4
sane secrets management
2026-01-04 19:29:22 +00:00
Andrej Karpathy
eb7bbc1b66
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
2026-01-04 19:14:23 +00:00
Andrej Karpathy
ee79f29fbd
replace files-to-prompt with git ls-files for bloat metrics
...
files-to-prompt was including untracked files (knowledge/, dev scripts, etc.) which inflated the bloat metrics. now we use git ls-files to only count tracked source files, which is more accurate and removes an external dependency.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:38:15 +00:00
Andrej Karpathy
aa42f40e66
delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking
2026-01-03 23:55:28 +00:00
Andrej Karpathy
adb5d4a16c
uv lock has to change when we removed numpy the other commit
2025-11-13 15:16:27 +00:00
Luke Stanley
7a52f9bfbb
Updates lockfile with CPU package support without overwriting other architectures
2025-10-21 23:14:34 +00:00
karpathy
bb786c5560
i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
2025-10-21 10:07:40 -07:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
karpathy
306bc380ab
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
2025-10-16 10:04:43 -07:00
karpathy
3a5e0bc50b
initial commit
2025-10-13 06:49:24 -07:00