Sanzo00
|
53b3a4fb81
|
fix: missing val_bpb on resume
|
2025-11-22 11:04:20 +08:00 |
|
Andrej Karpathy
|
c6abcdfe3a
|
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
|
2025-11-13 15:34:40 +00:00 |
|
Andrej Karpathy
|
c6b7ab7440
|
grad clip logging and printing and cosmetics
|
2025-11-05 21:08:30 +00:00 |
|
Andrej
|
dfc88334b6
|
fix tok/sec calculation bug when grad accum steps > 1
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
|
2025-10-30 08:36:32 -07:00 |
|
svlandeg
|
8c9b004c99
|
typo fixes in scripts
|
2025-10-28 20:17:31 +01:00 |
|
water-vapor
|
a9de4b1038
|
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
|
2025-10-26 01:43:49 -05:00 |
|
Andrej Karpathy
|
81597cd616
|
move the lr schedule args up in base_train so they are tunable in configurator
|
2025-10-24 13:27:31 +00:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
dfcb1c16f1
|
Merge branch 'master' into cpu-mps-dev
|
2025-10-21 17:15:53 +00:00 |
|
Andrej Karpathy
|
c1d2ed1c13
|
use orig_model in sampling, silly of me to miss this
|
2025-10-20 00:05:09 +00:00 |
|
Andrej Karpathy
|
2bc521a6de
|
use orig_model in sampling, silly of me to miss this
|
2025-10-20 00:04:15 +00:00 |
|
karpathy
|
df600b6ed5
|
many small tweaks. base, eval, core work now i think
|
2025-10-16 15:46:18 -07:00 |
|
karpathy
|
786119d593
|
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
|
2025-10-16 10:26:19 -07:00 |
|
karpathy
|
279b74312c
|
adjust comment/guidance on device type
|
2025-10-16 10:06:39 -07:00 |
|
karpathy
|
306bc380ab
|
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
|
2025-10-16 10:04:43 -07:00 |
|
Andrej Karpathy
|
722da4f543
|
trying to add basic cpu support, will try mps too
|
2025-10-16 16:14:38 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|