Commit Graph

307 Commits

Author SHA1 Message Date
dhunganapramod9
dd10cf915d fix(chat_web): use removeprefix for SSE chunk to avoid corrupting payload when token contains 'data: ' 2026-03-01 08:40:30 -05:00
dhunganapramod9
24a74c4b7f fix(chat_web): add argparse choices for --source to avoid KeyError on invalid value 2026-03-01 08:40:21 -05:00
Andrej Karpathy
542beb0c8c bump speedrun to be the up to date leaderboard run 2026-02-04 02:12:04 +00:00
Andrej Karpathy
d510b1385b quick experiments to log 2026-02-03 23:21:39 +00:00
Andrej Karpathy
16b8ac7da3 oops forgot to attach leaderboard file too 2026-02-03 21:06:12 +00:00
Andrej Karpathy
fe55b092b8 minor cosmetics for the table 2026-02-03 21:05:28 +00:00
Andrej Karpathy
a67eba35dc add feb2 new leaderboard record from upgrading to fp8 training, +4.3% speedup to time to GPT-2 2026-02-03 21:03:42 +00:00
Andrej Karpathy
6079f78fc3 add fp8 training with torchao 2026-02-03 21:03:42 +00:00
Andrej Karpathy
8ebc14b348 small touchups to the eval script, re-order items etc, cosmetic 2026-02-03 21:03:42 +00:00
Sofie Van Landeghem
72b9064f9d
remove leftover mid references (#491) 2026-02-02 08:33:46 -08:00
Andrej Karpathy
b19b4f3e49 fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16 2026-02-02 15:50:14 +00:00
Andrej Karpathy
230d6cf6c6 tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3 2026-02-02 01:45:59 +00:00
Andrej Karpathy
07c4dd4cd9 manually control the over-active garbage collector, save a small few minutes from a typical run 2026-02-02 01:44:30 +00:00
Andrej Karpathy
e8fec97d4c slightly more efficient dataloader that reduces the number of python objects flying around and causing strain on runtime and garbage collector 2026-02-02 01:17:30 +00:00
Andrej Karpathy
8b4849d548 fix bug in chat_sft, the attention window must be preserved sigh 2026-02-01 20:58:44 +00:00
Andrej Karpathy
eaf49a33c8 fix path which i think was modified during the refactor and this is a bug introduced by claude i believe 2026-02-01 20:15:19 +00:00
Andrej Karpathy
31b61d2d17 fix broken import sigh 2026-02-01 05:03:44 +00:00
Sofie Van Landeghem
4d6415b8ef
use _PEAK_FLOPS_TABLE instead of if-else structure (#479) 2026-01-31 19:45:06 -08:00
Sofie Van Landeghem
43078c347e
clean up original tokenizing_distributed_data_loader (#478) 2026-01-31 19:44:12 -08:00
Franci Penov
dc291c627f
Add Blackwell (SM100) GPU support via SDPA fallback (#475) 2026-01-31 19:42:58 -08:00
Andrej Karpathy
0307997f9b merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both 2026-02-01 02:36:43 +00:00
Andrej Karpathy
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1 2026-01-31 19:12:25 +00:00
Andrej Karpathy
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining 2026-01-31 18:21:36 +00:00
Andrej Karpathy
3c3a3d7042 warmdown of 0.5 is slightly better: 2026-01-31 01:08:44 +00:00
Andrei Panferov
4d8dbaf6e0
Fix escape character in README bibtex entry (#454) 2026-01-30 09:34:02 -08:00
Andrej Karpathy
3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)
SDPA fallback now respects sliding window during single-token KV-cache
decode by slicing K/V to the last (window + 1) tokens.

Also simplifies the mask building for chunk inference to properly apply
sliding window in that path as well.

Fixes #452

Co-Authored-By: Kartik Vashishta <kartikv776@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 17:32:12 +00:00
Aarushi Singh
ace6740bdd
feat: allow top_k=0 in web api to disable filtering (#458)
* allow top_k=0 in web api to disable filtering

* adding a comment for clear reasoning

* adding change to docstring
2026-01-30 09:21:41 -08:00
Harsh Gupta
2e17723817
Fix generate() crash when top_k=0 (#467)
Prevent a crash in generate() by skipping top-k filtering when top_k is set to 0
2026-01-30 09:21:02 -08:00
Andrej Karpathy
02baa15405 i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew 2026-01-30 17:08:53 +00:00
Andrej Karpathy
d6c4f3b923 i think this is the new torch 2.9+ API for declaring tf32 preference 2026-01-30 17:03:15 +00:00
Andrej Karpathy
067daa7758 small fix cpu script ty PR #474 2026-01-30 02:11:25 +00:00
Andrej Karpathy
6a341f2ecf contiguous views and single HtoD transfer for inputs/targets much cleaner 2026-01-30 00:23:01 +00:00
Andrej Karpathy
ebd4d9bbf5 tried muonh, appealing but didn't work out of the box 2026-01-29 19:01:36 +00:00
Andrej Karpathy
41bb2eac32 Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help 2026-01-29 00:52:08 +00:00
Andrej Karpathy
64a651a63c include .claude is ok 2026-01-29 00:35:02 +00:00
Andrej Karpathy
65df0de42b add arxiv reading skill 2026-01-29 00:34:24 +00:00
Andrej Karpathy
74554be3b5 revert engram, not seeing an improvement at larger scale 2026-01-28 20:07:39 +00:00
Sofie Van Landeghem
d5418ea5a1
Fix link to DeepSeek Engram paper (#470)
* Fix link to DeepSeek Engram paper in LOG.md

Updated link to the DeepSeek Engram paper in the log.

* remove www
2026-01-28 08:31:44 -08:00
Andrej Karpathy
c88bbf8133 Merge branch 'engram' 2026-01-27 22:33:16 +00:00
Andrej Karpathy
c8d93beed2 add engram-lite, add log, tune scaling laws analysis scripts 2026-01-27 22:31:17 +00:00
Andrej Karpathy
8630d32be4 quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00
Andrej Karpathy
59e36cc727 first version of engram following modded nanogpt style 2026-01-25 18:59:51 +00:00
Andrej Karpathy
85b3e95e09 320 experiments just to tune the adam beta1 of x0 a little bit up from 0.8 to 0.96 2026-01-25 00:04:02 +00:00
xiayan0118
6a477eedbd
fix: pass device_type to compute_init in engine.__main__ (#451)
When running engine.py directly on non-GPU devices (CPU, MPS),
compute_init() needs the device_type parameter to initialize correctly.
This fixes failures on machines without CUDA support.
2026-01-19 17:19:51 -08:00
Andrej Karpathy
63bb5831e2 something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir 2026-01-18 15:27:41 +00:00
Andrej Karpathy
a91743c168 Merge branch 've' 2026-01-18 15:14:39 +00:00
Andrej Karpathy
d58fcd9d73 log for jan 17 2026-01-18 03:01:17 +00:00
Andrej Karpathy
babde18ce1 small tweaks 2026-01-18 03:00:38 +00:00
Andrej Karpathy
cf5c9e5b8e resolve a crash for odd depths because FA3 needs head_dim % 8 == 0 2026-01-18 00:07:08 +00:00
Andrej Karpathy
413e91aa0f optimal ratio is now around 4 2026-01-17 23:51:09 +00:00