karpathy
|
f9a7e0f111
|
update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption
|
2026-01-17 12:27:30 -08:00 |
|
Andrej Karpathy
|
7312ec9898
|
fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
|
2026-01-13 22:45:27 +00:00 |
|
Andrej Karpathy
|
3b50b77ed3
|
fix base_loss to report correct loss by switching the dataloader to the new default
|
2026-01-13 22:09:36 +00:00 |
|
Andrej Karpathy
|
21608ec51e
|
allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
|
2026-01-12 03:10:13 +00:00 |
|
Andrej Karpathy
|
eb7bbc1b66
|
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
|
2026-01-04 19:14:23 +00:00 |
|
karpathy
|
df600b6ed5
|
many small tweaks. base, eval, core work now i think
|
2025-10-16 15:46:18 -07:00 |
|
karpathy
|
786119d593
|
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
|
2025-10-16 10:26:19 -07:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|