Commit Graph

162 Commits

Author SHA1 Message Date
Muheng
9196ff6fc0 ready to run 2026-01-08 13:34:34 +00:00
Muheng
8f1378235e to_hf adjusted to current imple 2026-01-06 06:34:46 +00:00
Muheng
952ea5137a align the upstream design 2026-01-06 05:50:48 +00:00
Muheng
74d94c923f gpt2 backend 2026-01-06 05:05:51 +00:00
Muheng
0582d669b6 adjust to_hf for nanomoe 2026-01-01 14:01:01 +00:00
Peiqi Duan
da4d9b532a upd aligned version 2026-01-01 12:30:00 +00:00
Peiqi Duan
62e339f3a9 upd aligned version 2026-01-01 12:26:40 +00:00
Muheng
08f3255ff5 previous moe to_hf implementation 2026-01-01 12:03:33 +00:00
Peiqi Duan
aef3893f44 add scripts
add scripts
2025-12-24 10:24:06 +00:00
Peiqi Duan
ef830fd7f9 Update some basic things.
Update some basic things.
2025-12-24 10:24:06 +00:00
Muheng
b12aa2699d humaneval done - delete logs 2025-12-23 16:22:20 +00:00
Muheng
a1f836bbeb humaneval done 2025-12-23 16:21:51 +00:00
Muheng
c026e6f63d able to run gsm8k 2025-12-23 14:55:27 +00:00
Muheng
9c8468df0a Point tools/lm-eval submodule to bill810975 fork 2025-12-23 14:47:06 +00:00
Muheng
6fb1d64864 debug hf inference 2025-12-17 11:07:37 +00:00
Muheng
6095f82fdd now ready for install 2025-12-17 10:52:39 +00:00
Muheng
bc11cd9e5b eval module needs to test 2025-12-15 15:17:12 +08:00
Test User
77da258ee1 Add lm-evaluation-harness as a submodule 2025-12-15 15:17:07 +08:00
Andrej
4a87a0d19f
Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup
Fix comment: rotary embeddings final dimension size
2025-11-17 13:29:21 -08:00
Sam Abrahams
11e68bf442 Fix comment: rotary embeddings final dimension size 2025-11-17 11:32:56 -05:00
Andrej Karpathy
bc1fca39f3 mqa -> gqa to reduce confusion 2025-11-15 15:43:37 +00:00
Andrej
f66a780f68
Fix torch.dtype mismatching when running engine inline test. 2025-11-14 07:28:29 -08:00
Andrej
4763ce612a
Small fixes to typos 2025-11-14 07:25:59 -08:00
Sofie Van Landeghem
c6f5bd67db
revert change of base to sft for quick inline test 2025-11-14 12:20:03 +01:00
svlandeg
a2fb3c83a6 fix typos 2025-11-14 11:20:25 +01:00
svlandeg
e5efb4b471 add test_engine.py to file structure 2025-11-14 11:13:42 +01:00
Andrej Karpathy
9a71d13688 typo oops 2025-11-13 16:08:30 +00:00
Andrej Karpathy
7b7fd0fe71 thank you Sophie for your help with nanochat 2025-11-13 16:07:54 +00:00
Andrej Karpathy
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster. 2025-11-13 15:34:40 +00:00
Andrej Karpathy
91f09ccd0d minor fix comment in engine 2025-11-13 15:28:18 +00:00
Andrej Karpathy
adb5d4a16c uv lock has to change when we removed numpy the other commit 2025-11-13 15:16:27 +00:00
howardgao@outlook.com
b399e43168 fix engine test bug 2025-11-06 08:56:45 +08:00
Andrej Karpathy
c6b7ab7440 grad clip logging and printing and cosmetics 2025-11-05 21:08:30 +00:00
Andrej
885a4f25e7
Replace fcntl with filelock for Windows compatibility 2025-11-04 16:35:39 -08:00
Andrej
3a2ae631c4
Merge branch 'master' into master 2025-11-04 16:35:02 -08:00
Andrej
12d995f58c
Add NPROC_PER_NODE var to speedrun.sh and run1000.sh 2025-11-04 16:26:33 -08:00
svlandeg
f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts 2025-11-04 21:36:10 +01:00
Andrej
d1558c7873
handle bf16 on MPS by casting to fp32 during load checkpoint 2025-11-04 09:42:50 -08:00
Andrej
df25293087
Add explicit UTF-8 encoding on open 2025-11-04 09:38:18 -08:00
Yasser Makram
1e89af9862 Replace fcntl with filelock for Windows compatibility 2025-11-04 07:22:34 +00:00
Dipesh Babu
7a40ee77b4 fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues 2025-11-03 16:00:56 -05:00
svlandeg
2ce62ec076 ensure consistency of quotes within each statement 2025-11-03 21:52:02 +01:00
svlandeg
e22fc6f2fa few more explicit UTF-8 encodings 2025-11-03 21:46:39 +01:00
svlandeg
c72b8b2309 add explicit UTF-8 encoding 2025-11-03 21:27:12 +01:00
Andrej
a83646e098
fix(eval): use UTF-8 when reading CORE JSONL and writing CSV 2025-11-03 06:38:33 -08:00
Andrej
8681922328
fix lstrip bug, make it removeprefix, TIL. 2025-11-03 06:37:48 -08:00
Dipesh Babu
226953b841 fix: open JSONL and results CSV with UTF-8 encoding for portability 2025-11-03 01:20:56 -05:00
Josh Odom
f1e15f5f4d Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead. 2025-11-02 23:40:37 -06:00
Andrej
b6da6982f6
fix nanochat logo: the t was placed too far to the right 2025-11-02 08:17:00 -08:00
Andrej
c2c4f77e22
oops small bugfix to run1000.sh missing kwarg 2025-11-02 08:14:41 -08:00