Commit Graph

186 Commits

Author SHA1 Message Date
Dipesh Babu
2f2d7ab80c
fix: safe DDP cleanup (check initialized PG, not just env) (#256) 2025-12-27 20:27:40 -08:00
Andrej Karpathy
91d76cc690 Replace speedup assertion with warning in batch_encode test
Performance varies by machine and load, making hard assertions flaky.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 04:10:49 +00:00
Andrej
7a8769a40c
Merge pull request #383 from barisozmen/master
3x faster rust encode (`batch_encode`) (12 LoC + 2 tests)
2025-12-27 20:06:57 -08:00
Andrej
088726aa7d
clean up model_tag handling across scripts a bit more. 2025-12-27 20:01:09 -08:00
Andrej Karpathy
2874eda59a update to new os env var to get rid of deprecation warning 2025-12-28 03:32:46 +00:00
Andrej Karpathy
e1770a3061 remove spurious cast, gets compiled away anyway but it's confusing people 2025-12-27 23:07:48 +00:00
Andrej Karpathy
49389ecaa8 fix tf32 warning for deprecated api use 2025-12-27 22:03:06 +00:00
DU Wenjie
ea4229851b bugfix 2025-12-26 19:02:12 +08:00
DU Wenjie
7840049189 bugfix keep same args style in scripts/base_eval.py 2025-12-26 17:29:08 +08:00
Andrej
bc51da8bac
pad vocab size to 64 for DDP optimizers and efficiency 2025-12-23 09:13:31 -08:00
duwenjie
92c6654b95 bugfix save and load ckpt from model_tag dir 2025-12-21 15:07:04 +08:00
Barış Özmen
790f3be65c add rust batch encode as a faster option over encode 2025-12-18 19:17:59 +03:00
Matěj Kripner
d314e96aa2 formatting 2025-12-09 12:48:46 +01:00
Matěj Kripner
bbc57da7d5 slightly nicer error message 2025-12-09 12:46:48 +01:00
Matěj Kripner
f1bf69d562 feat: pad vocab size to 64 for DDP optimizers and efficiency 2025-12-09 12:38:18 +01:00
Andrej
d5759400f9
fixing two typos in comments 2025-12-08 20:03:08 -08:00
Andrej
e72c3299df
fix random.seed() footgun bug for SpellingBee data generation 2025-12-08 19:58:45 -08:00
Andrej
7931e0903a
rename checkpoint_dir to checkpoints_dir for consistency. 2025-12-08 18:32:12 -08:00
Andrej
849d95ae1f
remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer 2025-12-08 18:30:37 -08:00
Andrej
39cccc527f
small bugfix make mid_train script work even with a tiny number of iterations 2025-12-08 18:27:32 -08:00
Andrej
8b1cecaa95
Apply suggestion from @svlandeg for nicer looking comparison
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2025-12-08 18:27:06 -08:00
Andrej
58f3e84e01
clean up train/val loader in sft for consistency with mid/base 2025-12-08 18:23:57 -08:00
Andrej
1b2a675c88
Improve KV cache code readability 2025-12-08 18:19:05 -08:00
Andrej
d75e6ed711
Fix script comment to reference correct file 2025-12-08 18:16:42 -08:00
Andrej
72a7cf2bc4
Fix distributed Parquet dataloader resume for multi-epoch training 2025-12-08 18:15:02 -08:00
Andrej Karpathy
bffdb2ef91 group common code to make things neater in gpt logit computation 2025-12-09 02:01:05 +00:00
Andrej
cbf30c842c
apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs. 2025-12-08 14:17:43 -08:00
Andrej Karpathy
90442de35f fix bug where any rank has to be able to create checkpoint_dir if saving optim 2025-12-08 20:45:19 +00:00
Andrej
2fd0440355
fix: missing val_bpb on resume 2025-12-08 12:35:08 -08:00
sunyujun03
01ea71be39 Fix distributed Parquet dataloader resume for multi-epoch training 2025-12-08 00:10:19 -06:00
KimYeongHyeon
a8847a0f83
Fix script comment to reference correct file 2025-12-02 10:46:20 +09:00
deepbuilder
06677c30e0
Refactor dimension validation for KV cache 2025-11-28 15:22:18 -05:00
deepbuilder
a770dcef2e
Fix kv_cache indexing to explicitly include head dimension 2025-11-28 15:00:14 -05:00
spjosyula
16788eed3c fix(model): apply float32 cast before logits softcapping
This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision
2025-11-23 20:12:09 +05:30
Sanzo00
53b3a4fb81 fix: missing val_bpb on resume 2025-11-22 11:04:20 +08:00
svlandeg
4bcc3bb698 clarify comment 2025-11-21 13:19:45 +01:00
Eric Silberstein
f37d45c21f remove unneeded iter() 2025-11-20 15:14:56 -05:00
Eric Silberstein
5c93a56be5 remove unnecessary check 2025-11-19 16:31:41 -05:00
Eric Silberstein
dddb95caac make mid_train script work even with a tiny number of iterations 2025-11-19 15:52:20 -05:00
Eric Silberstein
a4a0959c73 renamed find_largest_model() argument checkpoint_dir to checkpoints_dir for clarity 2025-11-19 15:33:36 -05:00
Eric Silberstein
024781f9df fixing two typos in comments 2025-11-19 15:12:53 -05:00
Eric Silberstein
97770700f2 change test/train split approach because random.seed(1) and random.seed(-1) do the same thing 2025-11-19 14:51:02 -05:00
Andrej
4a87a0d19f
Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup
Fix comment: rotary embeddings final dimension size
2025-11-17 13:29:21 -08:00
Sam Abrahams
11e68bf442 Fix comment: rotary embeddings final dimension size 2025-11-17 11:32:56 -05:00
Andrej Karpathy
bc1fca39f3 mqa -> gqa to reduce confusion 2025-11-15 15:43:37 +00:00
Andrej
f66a780f68
Fix torch.dtype mismatching when running engine inline test. 2025-11-14 07:28:29 -08:00
Andrej
4763ce612a
Small fixes to typos 2025-11-14 07:25:59 -08:00
Sofie Van Landeghem
c6f5bd67db
revert change of base to sft for quick inline test 2025-11-14 12:20:03 +01:00
svlandeg
a2fb3c83a6 fix typos 2025-11-14 11:20:25 +01:00
svlandeg
e5efb4b471 add test_engine.py to file structure 2025-11-14 11:13:42 +01:00