kiankyars
|
52c7d23a63
|
Merge 2f4f20862d into 4a87a0d19f
|
2025-11-23 08:11:54 -07:00 |
|
Kian Kyars
|
2f4f20862d
|
add back comment
|
2025-11-23 08:09:28 -07:00 |
|
Kian Kyars
|
1d719a7c94
|
add back hugging face tokenizer
|
2025-11-23 08:07:52 -07:00 |
|
Kian Kyars
|
d28d69f3ea
|
reduce list redundancy
|
2025-11-23 08:07:11 -07:00 |
|
Andrej
|
4a87a0d19f
|
Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup
Fix comment: rotary embeddings final dimension size
|
2025-11-17 13:29:21 -08:00 |
|
Sam Abrahams
|
11e68bf442
|
Fix comment: rotary embeddings final dimension size
|
2025-11-17 11:32:56 -05:00 |
|
Andrej Karpathy
|
bc1fca39f3
|
mqa -> gqa to reduce confusion
|
2025-11-15 15:43:37 +00:00 |
|
Andrej
|
f66a780f68
|
Fix torch.dtype mismatching when running engine inline test.
|
2025-11-14 07:28:29 -08:00 |
|
Andrej
|
4763ce612a
|
Small fixes to typos
|
2025-11-14 07:25:59 -08:00 |
|
Sofie Van Landeghem
|
c6f5bd67db
|
revert change of base to sft for quick inline test
|
2025-11-14 12:20:03 +01:00 |
|
svlandeg
|
a2fb3c83a6
|
fix typos
|
2025-11-14 11:20:25 +01:00 |
|
svlandeg
|
e5efb4b471
|
add test_engine.py to file structure
|
2025-11-14 11:13:42 +01:00 |
|
Andrej Karpathy
|
9a71d13688
|
typo oops
|
2025-11-13 16:08:30 +00:00 |
|
Andrej Karpathy
|
7b7fd0fe71
|
thank you Sophie for your help with nanochat
|
2025-11-13 16:07:54 +00:00 |
|
Andrej Karpathy
|
c6abcdfe3a
|
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
|
2025-11-13 15:34:40 +00:00 |
|
Andrej Karpathy
|
91f09ccd0d
|
minor fix comment in engine
|
2025-11-13 15:28:18 +00:00 |
|
Andrej Karpathy
|
adb5d4a16c
|
uv lock has to change when we removed numpy the other commit
|
2025-11-13 15:16:27 +00:00 |
|
howardgao@outlook.com
|
b399e43168
|
fix engine test bug
|
2025-11-06 08:56:45 +08:00 |
|
Andrej Karpathy
|
c6b7ab7440
|
grad clip logging and printing and cosmetics
|
2025-11-05 21:08:30 +00:00 |
|
Andrej
|
885a4f25e7
|
Replace fcntl with filelock for Windows compatibility
|
2025-11-04 16:35:39 -08:00 |
|
Andrej
|
3a2ae631c4
|
Merge branch 'master' into master
|
2025-11-04 16:35:02 -08:00 |
|
Andrej
|
12d995f58c
|
Add NPROC_PER_NODE var to speedrun.sh and run1000.sh
|
2025-11-04 16:26:33 -08:00 |
|
svlandeg
|
f1683c5b16
|
set nproc_per_node as var in speedrun and run1000 scripts
|
2025-11-04 21:36:10 +01:00 |
|
Andrej
|
d1558c7873
|
handle bf16 on MPS by casting to fp32 during load checkpoint
|
2025-11-04 09:42:50 -08:00 |
|
Andrej
|
df25293087
|
Add explicit UTF-8 encoding on open
|
2025-11-04 09:38:18 -08:00 |
|
Yasser Makram
|
1e89af9862
|
Replace fcntl with filelock for Windows compatibility
|
2025-11-04 07:22:34 +00:00 |
|
Dipesh Babu
|
7a40ee77b4
|
fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues
|
2025-11-03 16:00:56 -05:00 |
|
svlandeg
|
2ce62ec076
|
ensure consistency of quotes within each statement
|
2025-11-03 21:52:02 +01:00 |
|
svlandeg
|
e22fc6f2fa
|
few more explicit UTF-8 encodings
|
2025-11-03 21:46:39 +01:00 |
|
svlandeg
|
c72b8b2309
|
add explicit UTF-8 encoding
|
2025-11-03 21:27:12 +01:00 |
|
Andrej
|
a83646e098
|
fix(eval): use UTF-8 when reading CORE JSONL and writing CSV
|
2025-11-03 06:38:33 -08:00 |
|
Andrej
|
8681922328
|
fix lstrip bug, make it removeprefix, TIL.
|
2025-11-03 06:37:48 -08:00 |
|
Dipesh Babu
|
226953b841
|
fix: open JSONL and results CSV with UTF-8 encoding for portability
|
2025-11-03 01:20:56 -05:00 |
|
Josh Odom
|
f1e15f5f4d
|
Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.
|
2025-11-02 23:40:37 -06:00 |
|
Andrej
|
b6da6982f6
|
fix nanochat logo: the t was placed too far to the right
|
2025-11-02 08:17:00 -08:00 |
|
Andrej
|
c2c4f77e22
|
oops small bugfix to run1000.sh missing kwarg
|
2025-11-02 08:14:41 -08:00 |
|
Andrej
|
d1ac0b2d07
|
when loading models on CPU, convert tensors from bfloat16 to float
|
2025-11-02 07:58:56 -08:00 |
|
svlandeg
|
5bfcd31b73
|
revert more formatting changes
|
2025-11-02 14:17:10 +01:00 |
|
svlandeg
|
036a3c5881
|
revert formatting changes to facilitate review
|
2025-11-02 14:16:43 +01:00 |
|
svlandeg
|
52e85aaf80
|
Merge branch 'master' into fix/typo
|
2025-11-02 13:41:13 +01:00 |
|
Jing Zhang
|
ba4f40bf58
|
Update run1000.sh to add missing --run=$WANDB_RUN
|
2025-11-01 21:27:00 -07:00 |
|
Manuel Saelices
|
d54c9cbf8c
|
CPU Support, as bfloat16 params breaks inference
|
2025-11-01 23:38:50 +01:00 |
|
Andrej Karpathy
|
cf587acb1a
|
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
|
2025-11-01 16:04:38 +00:00 |
|
Andrej Karpathy
|
7d2c4a3d95
|
delete pandas dep in base_eval use csv instead
|
2025-11-01 15:28:30 +00:00 |
|
Andrej
|
ad39db5a23
|
tiny fix to comment
Update engine.py with correct error message on assert
|
2025-11-01 07:43:57 -07:00 |
|
Andrej
|
630f54ae5a
|
use empty locals and globals in call to eval() in engine tool use
harden eval: prevent the calc tool from accessing globals and locals
|
2025-11-01 07:22:59 -07:00 |
|
Andrej Karpathy
|
f15732524a
|
make deepwiki link better
|
2025-11-01 14:13:29 +00:00 |
|
Andrej
|
dfc88334b6
|
fix tok/sec calculation bug when grad accum steps > 1
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
|
2025-10-30 08:36:32 -07:00 |
|
Andrej
|
eb11bb0e2e
|
remove numpy as dep
Remove explicit numpy dependency
|
2025-10-30 08:28:14 -07:00 |
|
svlandeg
|
70319851fc
|
fix typo
|
2025-10-29 19:48:34 +01:00 |
|