Commit Graph

9 Commits

Author SHA1 Message Date
karpathy
f9a7e0f111 update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption 2026-01-17 12:27:30 -08:00
Yury Kirpichev
77a46902e4
Fix WANDB_RUN parameter passing in runcpu.sh (#407)
- Add --run=$WANDB_RUN to base_train, mid_train, and chat_sft calls
- Ensures wandb logging works when WANDB_RUN environment variable is set
- Matches the behavior in speedrun.sh

Co-authored-by: svlandeg <svlandeg@github.com>
2026-01-16 18:59:44 -08:00
Andrej Karpathy
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way 2026-01-13 22:45:27 +00:00
Sofie Van Landeghem
a1ccb3dc0b
remove rust compilation as rustbpe is now installed from separate package (#416) 2026-01-08 06:18:37 -08:00
Andrej Karpathy
cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts 2025-11-01 16:04:38 +00:00
Tancrède Lepoint
d5cda11ab8 Export the base dir variable 2025-10-22 18:15:02 -04:00
Luke Stanley
901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-21 23:14:16 +00:00
Andrej Karpathy
94ee507054 quick fix base eval due to fewshot requirement 2025-10-21 17:56:08 +00:00
karpathy
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00