Yury Kirpichev
a86320d7cf
Merge 47885e743b into b33e394528
2026-01-12 10:48:35 +08:00
Andrej Karpathy
fbc1484e8c
add alternating window size patterns for the GPT layers, following GPT-3. Experimented a bit and found the pattern SSSL to work well - 3 short, 1 long alternating. This is now the new default and the plots look quite a bit better on flops vs. bpb
2026-01-11 21:49:54 +00:00
Andrej Karpathy
2ff7d51252
integrate Flash Attention 3. +9% tok_per_sec for d12 with ctx even as low as 2048 out of the box nice. also, ready to tune windows huge
2026-01-11 20:33:19 +00:00
Andrej Karpathy
aa530cdad5
Add learnable lambdas that gate the residual connection and a skip connection to the input embeddings, solid bump to val_bpb
2026-01-11 18:47:35 +00:00
Andrej Karpathy
2c4473dd1b
Big Muon optimizer changes inspired by latest of modded-nanogpt. Added Polar Express, Adafactor-style variance reduction, cautious weight decay, schedule weight decay linearly to ramp down to zero. Tuned optimum weight decay for multiple model sizes d8, d12, d16, d20 and found a scaling law with optimum wd \propto 1/channels^2, including it as default into code. --weight_decay of base_train is now default on and configured optimally according to all of these experiments. Solid bump to val_bpb observed as a result of these changes.
2026-01-11 16:56:59 +00:00
Sofie Van Landeghem
a1ccb3dc0b
remove rust compilation as rustbpe is now installed from separate package ( #416 )
2026-01-08 06:18:37 -08:00
Yury Kirpichev
47885e743b
Fix WANDB_RUN parameter passing in runcpu.sh
...
- Add --run=$WANDB_RUN to base_train, mid_train, and chat_sft calls
- Ensures wandb logging works when WANDB_RUN environment variable is set
- Matches the behavior in speedrun.sh
2026-01-07 23:25:43 -08:00
Andrej Karpathy
061f83c152
delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
2026-01-08 02:16:50 +00:00
Andrej Karpathy
e8c30c3b19
add notebook used for scaling laws analysis
2026-01-07 22:28:53 +00:00
Andrej Karpathy
54e59c38ad
add notebook on deriving the CORE estimates for the GPT-3 miniseries.
2026-01-05 18:40:28 +00:00
Andrej Karpathy
ed2082fbc4
sane secrets management
2026-01-04 19:29:22 +00:00
svlandeg
2ce62ec076
ensure consistency of quotes within each statement
2025-11-03 21:52:02 +01:00
svlandeg
e22fc6f2fa
few more explicit UTF-8 encodings
2025-11-03 21:46:39 +01:00
Andrej
b6da6982f6
fix nanochat logo: the t was placed too far to the right
2025-11-02 08:17:00 -08:00
Andrej Karpathy
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
2025-11-01 16:04:38 +00:00
svlandeg
b996131570
Merge branch 'master' into logo/kerning-update
2025-10-29 11:45:40 +01:00
Andrej
a1de1f46ad
Merge pull request #156 from tlepoint/fix/export-base-dir
...
Export the base dir variable in runcpu.sh
2025-10-28 15:19:08 -07:00
svlandeg
8c9b004c99
typo fixes in scripts
2025-10-28 20:17:31 +01:00
Tancrède Lepoint
d5cda11ab8
Export the base dir variable
2025-10-22 18:15:02 -04:00
Luke Stanley
901b075605
Fix GPU-less CPU use on Linux with specific Torch indexes
2025-10-21 23:14:16 +00:00
Andrej Karpathy
94ee507054
quick fix base eval due to fewshot requirement
2025-10-21 17:56:08 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
obxium
938cb31f1a
Update logo
2025-10-14 14:19:44 -04:00
karpathy
a53833d04f
add nanochat logo png
2025-10-13 06:59:59 -07:00
karpathy
3a5e0bc50b
initial commit
2025-10-13 06:49:24 -07:00