Anish
83dccc20ae
Restore completion-only loss masking in SFT dataloader ( #582 )
...
* printing steps count
* adding reply only loss for chat
* using the mask by render_conversation function of tokeniser
* undoing some changes
* putting back the comment which got removed accidently, no functionality change
2026-03-02 16:37:47 -08:00
Andrej Karpathy
8180e1d8c1
tune the data mixture a bit, load optimizer by default when SFT. These were confirmed to be best settings from sweeps of sft
2026-02-16 20:23:04 +00:00
Andrej Karpathy
788dadeb88
a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
2026-02-16 14:41:53 +00:00
Andrej Karpathy
8b4849d548
fix bug in chat_sft, the attention window must be preserved sigh
2026-02-01 20:58:44 +00:00
Andrej Karpathy
eaf49a33c8
fix path which i think was modified during the refactor and this is a bug introduced by claude i believe
2026-02-01 20:15:19 +00:00
Andrej Karpathy
1ddaad1c1c
nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
2026-01-31 19:12:25 +00:00
Andrej Karpathy
41bb2eac32
Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help
2026-01-29 00:52:08 +00:00
Andrej Karpathy
7312ec9898
fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
2026-01-13 22:45:27 +00:00
Andrej Karpathy
eb7bbc1b66
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
2026-01-04 19:14:23 +00:00
Andrej
088726aa7d
clean up model_tag handling across scripts a bit more.
2025-12-27 20:01:09 -08:00
Andrej Karpathy
2874eda59a
update to new os env var to get rid of deprecation warning
2025-12-28 03:32:46 +00:00
duwenjie
92c6654b95
bugfix save and load ckpt from model_tag dir
2025-12-21 15:07:04 +08:00
Eric Silberstein
f37d45c21f
remove unneeded iter()
2025-11-20 15:14:56 -05:00
svlandeg
70319851fc
fix typo
2025-10-29 19:48:34 +01:00
Andrej Karpathy
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
2025-10-24 14:02:48 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Andrej Karpathy
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
2025-10-15 16:42:23 +00:00
karpathy
3a5e0bc50b
initial commit
2025-10-13 06:49:24 -07:00