Shizhe Diao
|
21d8b9994f
|
multinode slurm submit
|
2025-10-22 21:20:23 -07:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
dfcb1c16f1
|
Merge branch 'master' into cpu-mps-dev
|
2025-10-21 17:15:53 +00:00 |
|
Andrej Karpathy
|
c1d2ed1c13
|
use orig_model in sampling, silly of me to miss this
|
2025-10-20 00:05:09 +00:00 |
|
Andrej Karpathy
|
2bc521a6de
|
use orig_model in sampling, silly of me to miss this
|
2025-10-20 00:04:15 +00:00 |
|
karpathy
|
df600b6ed5
|
many small tweaks. base, eval, core work now i think
|
2025-10-16 15:46:18 -07:00 |
|
karpathy
|
786119d593
|
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
|
2025-10-16 10:26:19 -07:00 |
|
karpathy
|
279b74312c
|
adjust comment/guidance on device type
|
2025-10-16 10:06:39 -07:00 |
|
karpathy
|
306bc380ab
|
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
|
2025-10-16 10:04:43 -07:00 |
|
Andrej Karpathy
|
722da4f543
|
trying to add basic cpu support, will try mps too
|
2025-10-16 16:14:38 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|