dhunganapramod9
dd10cf915d
fix(chat_web): use removeprefix for SSE chunk to avoid corrupting payload when token contains 'data: '
2026-03-01 08:40:30 -05:00
dhunganapramod9
24a74c4b7f
fix(chat_web): add argparse choices for --source to avoid KeyError on invalid value
2026-03-01 08:40:21 -05:00
Andrej Karpathy
542beb0c8c
bump speedrun to be the up to date leaderboard run
2026-02-04 02:12:04 +00:00
Andrej Karpathy
d510b1385b
quick experiments to log
2026-02-03 23:21:39 +00:00
Andrej Karpathy
16b8ac7da3
oops forgot to attach leaderboard file too
2026-02-03 21:06:12 +00:00
Andrej Karpathy
fe55b092b8
minor cosmetics for the table
2026-02-03 21:05:28 +00:00
Andrej Karpathy
a67eba35dc
add feb2 new leaderboard record from upgrading to fp8 training, +4.3% speedup to time to GPT-2
2026-02-03 21:03:42 +00:00
Andrej Karpathy
6079f78fc3
add fp8 training with torchao
2026-02-03 21:03:42 +00:00
Andrej Karpathy
8ebc14b348
small touchups to the eval script, re-order items etc, cosmetic
2026-02-03 21:03:42 +00:00
Sofie Van Landeghem
72b9064f9d
remove leftover mid references ( #491 )
2026-02-02 08:33:46 -08:00
Andrej Karpathy
b19b4f3e49
fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16
2026-02-02 15:50:14 +00:00
Andrej Karpathy
230d6cf6c6
tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3
2026-02-02 01:45:59 +00:00
Andrej Karpathy
07c4dd4cd9
manually control the over-active garbage collector, save a small few minutes from a typical run
2026-02-02 01:44:30 +00:00
Andrej Karpathy
e8fec97d4c
slightly more efficient dataloader that reduces the number of python objects flying around and causing strain on runtime and garbage collector
2026-02-02 01:17:30 +00:00
Andrej Karpathy
8b4849d548
fix bug in chat_sft, the attention window must be preserved sigh
2026-02-01 20:58:44 +00:00
Andrej Karpathy
eaf49a33c8
fix path which i think was modified during the refactor and this is a bug introduced by claude i believe
2026-02-01 20:15:19 +00:00
Andrej Karpathy
31b61d2d17
fix broken import sigh
2026-02-01 05:03:44 +00:00
Sofie Van Landeghem
4d6415b8ef
use _PEAK_FLOPS_TABLE instead of if-else structure ( #479 )
2026-01-31 19:45:06 -08:00
Sofie Van Landeghem
43078c347e
clean up original tokenizing_distributed_data_loader ( #478 )
2026-01-31 19:44:12 -08:00
Franci Penov
dc291c627f
Add Blackwell (SM100) GPU support via SDPA fallback ( #475 )
2026-01-31 19:42:58 -08:00
Andrej Karpathy
0307997f9b
merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both
2026-02-01 02:36:43 +00:00
Andrej Karpathy
1ddaad1c1c
nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
2026-01-31 19:12:25 +00:00
Andrej Karpathy
348fbb301b
fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
2026-01-31 18:21:36 +00:00
Andrej Karpathy
3c3a3d7042
warmdown of 0.5 is slightly better:
2026-01-31 01:08:44 +00:00
Andrei Panferov
4d8dbaf6e0
Fix escape character in README bibtex entry ( #454 )
2026-01-30 09:34:02 -08:00
Andrej Karpathy
3ba42e8135
Fix SDPA KV-cache decode to respect sliding window ( #456 )
...
SDPA fallback now respects sliding window during single-token KV-cache
decode by slicing K/V to the last (window + 1) tokens.
Also simplifies the mask building for chunk inference to properly apply
sliding window in that path as well.
Fixes #452
Co-Authored-By: Kartik Vashishta <kartikv776@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 17:32:12 +00:00
Aarushi Singh
ace6740bdd
feat: allow top_k=0 in web api to disable filtering ( #458 )
...
* allow top_k=0 in web api to disable filtering
* adding a comment for clear reasoning
* adding change to docstring
2026-01-30 09:21:41 -08:00
Harsh Gupta
2e17723817
Fix generate() crash when top_k=0 ( #467 )
...
Prevent a crash in generate() by skipping top-k filtering when top_k is set to 0
2026-01-30 09:21:02 -08:00
Andrej Karpathy
02baa15405
i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew
2026-01-30 17:08:53 +00:00
Andrej Karpathy
d6c4f3b923
i think this is the new torch 2.9+ API for declaring tf32 preference
2026-01-30 17:03:15 +00:00
Andrej Karpathy
067daa7758
small fix cpu script ty PR #474
2026-01-30 02:11:25 +00:00
Andrej Karpathy
6a341f2ecf
contiguous views and single HtoD transfer for inputs/targets much cleaner
2026-01-30 00:23:01 +00:00
Andrej Karpathy
ebd4d9bbf5
tried muonh, appealing but didn't work out of the box
2026-01-29 19:01:36 +00:00
Andrej Karpathy
41bb2eac32
Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help
2026-01-29 00:52:08 +00:00
Andrej Karpathy
64a651a63c
include .claude is ok
2026-01-29 00:35:02 +00:00
Andrej Karpathy
65df0de42b
add arxiv reading skill
2026-01-29 00:34:24 +00:00
Andrej Karpathy
74554be3b5
revert engram, not seeing an improvement at larger scale
2026-01-28 20:07:39 +00:00
Sofie Van Landeghem
d5418ea5a1
Fix link to DeepSeek Engram paper ( #470 )
...
* Fix link to DeepSeek Engram paper in LOG.md
Updated link to the DeepSeek Engram paper in the log.
* remove www
2026-01-28 08:31:44 -08:00
Andrej Karpathy
c88bbf8133
Merge branch 'engram'
2026-01-27 22:33:16 +00:00
Andrej Karpathy
c8d93beed2
add engram-lite, add log, tune scaling laws analysis scripts
2026-01-27 22:31:17 +00:00
Andrej Karpathy
8630d32be4
quick fix to not OOM main speedrun script
2026-01-26 22:31:42 +00:00
Andrej Karpathy
59e36cc727
first version of engram following modded nanogpt style
2026-01-25 18:59:51 +00:00
Andrej Karpathy
85b3e95e09
320 experiments just to tune the adam beta1 of x0 a little bit up from 0.8 to 0.96
2026-01-25 00:04:02 +00:00
xiayan0118
6a477eedbd
fix: pass device_type to compute_init in engine.__main__ ( #451 )
...
When running engine.py directly on non-GPU devices (CPU, MPS),
compute_init() needs the device_type parameter to initialize correctly.
This fixes failures on machines without CUDA support.
2026-01-19 17:19:51 -08:00
Andrej Karpathy
63bb5831e2
something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir
2026-01-18 15:27:41 +00:00
Andrej Karpathy
a91743c168
Merge branch 've'
2026-01-18 15:14:39 +00:00
Andrej Karpathy
d58fcd9d73
log for jan 17
2026-01-18 03:01:17 +00:00
Andrej Karpathy
babde18ce1
small tweaks
2026-01-18 03:00:38 +00:00
Andrej Karpathy
cf5c9e5b8e
resolve a crash for odd depths because FA3 needs head_dim % 8 == 0
2026-01-18 00:07:08 +00:00
Andrej Karpathy
413e91aa0f
optimal ratio is now around 4
2026-01-17 23:51:09 +00:00