Commit Graph

  • b20a9e6823
    Merge 57bcf6786e into 6a477eedbd Haoyu Wang 2026-01-20 16:54:25 +0000
  • e58a29eb14
    Merge 5434733a31 into 6a477eedbd Andrei Panferov 2026-01-20 12:39:20 +0100
  • 5434733a31
    Fix escape character in README bibtex entry Andrei Panferov 2026-01-20 12:39:01 +0100
  • 350f78116b
    Merge 65865df300 into 6a477eedbd Gaurav 2026-01-20 10:38:38 +0400
  • 9492a82464
    Merge 8cfa0451f4 into 6a477eedbd askerlee 2026-01-19 18:34:07 -0800
  • 3ce88ed883
    Merge ddf96c17c5 into 6a477eedbd jolonf 2026-01-19 18:31:35 -0800
  • 8896076c6e
    Merge 52f1a5ee5c into 6a477eedbd Yury Kirpichev 2026-01-19 18:29:12 -0800
  • 7ce460d390
    Merge bfbe965790 into 6a477eedbd Yury Kirpichev 2026-01-19 18:28:56 -0800
  • 6a477eedbd
    fix: pass device_type to compute_init in engine.__main__ (#451) master xiayan0118 2026-01-19 17:19:51 -0800
  • 8d8a038a16
    Merge eebab89a11 into 63bb5831e2 lenkog 2026-01-19 14:07:13 +0000
  • c90f77fc40
    Merge ff126c085e into 63bb5831e2 Sofie Van Landeghem 2026-01-19 14:05:26 +0000
  • c92723453f
    Merge 7f3154f025 into 63bb5831e2 Sermet Pekin 2026-01-19 13:50:17 +0000
  • 24b6175af9
    Merge 9a9b12b1be into 63bb5831e2 Jason Cox 2026-01-19 13:39:44 +0000
  • 096233f3a1
    Merge 02b22a5a13 into 63bb5831e2 Anton Chechetka 2026-01-19 10:20:13 +0000
  • 176a0b01a8
    Merge 89d2741cba into 63bb5831e2 VishalKrishnaKumar 2026-01-18 23:32:55 -0800
  • 7ded10ea30 fix: pass device_type to compute_init in engine.__main__ xiayan0118 2026-01-18 16:59:50 -0800
  • bfbe965790 auto detect torch flavour and num gpus Yury Kirpichev 2026-01-18 11:17:46 -0800
  • 52f1a5ee5c Add support for ROCm backend in speedrun script Yury Kirpichev 2025-12-07 18:43:40 -0800
  • 01252a430a
    Merge 673d75509d into 63bb5831e2 Sofie Van Landeghem 2026-01-19 00:49:36 +0530
  • d9a263bb5f
    Merge a58bbbaf59 into 63bb5831e2 Brian Edwards 2026-01-18 21:05:55 +0200
  • d70e15083c
    Merge a39c303912 into 63bb5831e2 Sofie Van Landeghem 2026-01-18 22:04:17 +0500
  • 708385a0d2 midtraining, sft, rl scripts and the final version of the nanochat-Mo 99ninew 2026-01-18 15:31:47 +0000
  • 63bb5831e2 something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir Andrej Karpathy 2026-01-18 15:27:41 +0000
  • 141dcea0f4
    Merge 23393eae83 into a91743c168 Jingu Kang 2026-01-19 00:17:00 +0900
  • a91743c168 Merge branch 've' Andrej Karpathy 2026-01-18 15:14:39 +0000
  • 28fb6247ae
    Merge 7950813a41 into d58fcd9d73 Pengyu Wang 2026-01-18 14:20:50 +0100
  • 8c0eeac2cc
    Merge 04862cbfea into d58fcd9d73 Evgeny 2026-01-18 14:20:44 +0100
  • a58bbbaf59 Merge branch 'master' into mps-support svlandeg 2026-01-18 14:16:54 +0100
  • ddf96c17c5 Fix for issue #446 - moved save_checkpoint() above evaluate_model() so that the checkpoint is saved before the evals are run. jolonf 2026-01-18 21:46:35 +1100
  • d58fcd9d73 log for jan 17 Andrej Karpathy 2026-01-18 03:01:13 +0000
  • babde18ce1 small tweaks Andrej Karpathy 2026-01-18 03:00:38 +0000
  • cf5c9e5b8e resolve a crash for odd depths because FA3 needs head_dim % 8 == 0 Andrej Karpathy 2026-01-18 00:07:08 +0000
  • 413e91aa0f optimal ratio is now around 4 Andrej Karpathy 2026-01-17 23:51:09 +0000
  • e7ed2082b8 update the default GPTConfig kwargs otherwise they are confusing Andrej Karpathy 2026-01-17 21:16:46 +0000
  • f9a7e0f111 update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption karpathy 2026-01-17 12:27:30 -0800
  • 1503dcf293
    Merge 745b156a0b into f5425245f9 aioaneid 2026-01-17 14:30:29 +0100
  • 3b85ae98bc
    Merge 81a958350a into f5425245f9 Pyry Takala 2026-01-17 14:01:06 +0300
  • cc9bcbf6fd
    Merge 00f1a3219d into f5425245f9 kiankyars 2026-01-17 15:24:26 +0800
  • f5425245f9 more GPU types from PR 147 thanks @Qubitium Andrej Karpathy 2026-01-17 03:22:20 +0000
  • 2955650327 add detection of device to report more correct mfu for bf16 Andrej Karpathy 2026-01-17 03:16:12 +0000
  • a885b24fbc
    Merge 26b0941f75 into 77a46902e4 Pyry Takala 2026-01-16 19:00:02 -0800
  • 77a46902e4
    Fix WANDB_RUN parameter passing in runcpu.sh (#407) Yury Kirpichev 2026-01-16 18:59:44 -0800
  • bbc4413c58
    Add high value engine tests for core invariants (33 LoC) (#396) Barış Özmen 2026-01-17 05:59:12 +0300
  • f42ae9e901
    fix condition to perform bpb evaluation (#324) Nitish Pandey 2026-01-17 08:26:43 +0530
  • e1dafc510f
    Reduce token waste in BOS bestfit by cropping shortest doc (#445) Yamahammer 2026-01-16 21:50:34 -0500
  • 6460dc6382 tweaks to readme a bit Andrej Karpathy 2026-01-17 02:28:31 +0000
  • 1933e85046 brief update to log Andrej Karpathy 2026-01-17 00:25:50 +0000
  • 3b95d4fd39 allow label for scaling laws script Andrej Karpathy 2026-01-17 00:23:30 +0000
  • 00f1a3219d speedrun Kian Kyars 2026-01-16 16:20:35 -0800
  • 2f7841cd50 remove all uv venv Kian Kyars 2026-01-16 16:22:39 -0800
  • e85db6b4a4 alternating design Andrej Karpathy 2026-01-16 23:52:12 +0000
  • 6ac7796120 Reduce token waste in BOS bestfit by cropping shortest doc Yamahammer 2026-01-16 17:13:04 -0500
  • 9a88194c3f simply one VE per layer, works best Andrej Karpathy 2026-01-16 22:08:52 +0000
  • 0b58d70e99 full ve version works very well Andrej Karpathy 2026-01-16 21:16:47 +0000
  • e3f58b838e ranked version Andrej Karpathy 2026-01-16 20:59:42 +0000
  • 184d4c12b1 also add to log about the FA3 changes Andrej Karpathy 2026-01-16 18:25:04 +0000
  • b62a5bc44a naturally i failed to include the actual code in the previous commit facepalm Andrej Karpathy 2026-01-16 17:39:41 +0000
  • 8203efa919 implement flash attention 3 fallback to pytorch sdpa by touching as few lines of code as possible in main files and keeping all implementation to a single file. add tests. add helpful warning messages for the user. Andrej Karpathy 2026-01-16 17:37:51 +0000
  • 673d75509d avoid keeping less data than the necessary examples for few-shot svlandeg 2026-01-16 12:27:50 +0100
  • 3e5fccdfa4
    feat: attempt fa3 load on sm < 9.0 (ampere/ada) hasso 2026-01-16 11:18:12 +0100
  • dc7d6d142d
    Merge 5172ea11bb into 50413d2d67 Michael Williams 2026-01-16 01:15:25 -0800
  • 38e4e0dd7b Merge branch 'master' into fix/fa3-fallback-mps svlandeg 2026-01-16 09:59:03 +0100
  • 50413d2d67
    typo in comments: change "GAPO" to "DAPO" Haoyu Wang 2026-01-16 01:03:42 -0500
  • fbf2bbea25 update log with a bunch of attempts Andrej Karpathy 2026-01-16 02:21:17 +0000
  • 747ed4491f add negative result on olmo3 pretraining mix Andrej Karpathy 2026-01-16 00:43:54 +0000
  • 7d1700c521 add zstd lib Andrej Karpathy 2026-01-16 00:40:59 +0000
  • d4ea28d4e2
    Fix args in readme (#438) Sofie Van Landeghem 2026-01-16 01:26:38 +0100
  • bdcc030ffa oops legacy spurious line now Andrej Karpathy 2026-01-15 23:32:20 +0000
  • 22a71aa3d3 fuse adamw into a single torch compiled kernel similar to muon. it's about 1.7X faster, but overall it's so tiny that it's not making a major dent Andrej Karpathy 2026-01-15 23:30:44 +0000
  • 255f8b9af6 cleanly separate cpu and gpu sections Andrej Karpathy 2026-01-15 23:30:11 +0000
  • d13b28dd99 fix relative_diff calculation svlandeg 2026-01-15 22:14:21 +0100
  • 785b214b84 add required -i flag to chat_eval example runs svlandeg 2026-01-15 21:35:05 +0100
  • 950a70e6a6
    Merge 28b7dae0c3 into 6bb92403d5 h3nock 2026-01-15 20:59:27 +0100
  • 28b7dae0c3 restore svlandeg 2026-01-15 20:58:53 +0100
  • 65865df300 Merge branch 'master' into master_goderr svlandeg 2026-01-15 20:47:26 +0100
  • 7cd3992f74
    Merge branch 'master' into feat/add_dataset_progress_bar Sofie Van Landeghem 2026-01-15 20:29:53 +0100
  • d3679cd0a8
    reverting change to .gitignore to prevent merge conflict Sofie Van Landeghem 2026-01-15 20:29:13 +0100
  • 89d2741cba Merge branch 'master' into issue-183-nvshmem-install-fix svlandeg 2026-01-15 20:21:15 +0100
  • ff126c085e Merge branch 'master' into fix/loop svlandeg 2026-01-15 20:03:55 +0100
  • a39c303912 Merge branch 'master' into fix/grad_acc_norm svlandeg 2026-01-15 20:03:10 +0100
  • 9de0a121c5 Merge branch 'master' into fix-wandb-for-local-run svlandeg 2026-01-15 20:01:23 +0100
  • 745b156a0b Merge branch 'master' into fix/shard_count svlandeg 2026-01-15 19:11:28 +0100
  • a91ad6b4b1 Merge branch 'master' into fix/args svlandeg 2026-01-15 19:06:14 +0100
  • 9b05e7c625 debug to_hf Muheng 2026-01-15 12:34:27 +0000
  • 6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit Andrej Karpathy 2026-01-15 03:20:48 +0000
  • 3142ca1a28 minor helpful message Andrej Karpathy 2026-01-15 03:20:21 +0000
  • 97364273e2 feat: restrict FA3 loading to Hopper+ GPUs (SM90+) to fix crashes on consumer hardware hasan 2026-01-14 22:14:42 +0100
  • d7fccbab82 fix: enforce (B, H, T, D) layout for SDPA fallback to support CPU strictness hasan 2026-01-14 21:42:20 +0100
  • 68e66be05c fix: wrap FA3 import in try-except block to support both CUDA and MPS hasan 2026-01-14 15:23:55 +0100
  • b88cef6053 fix typo svlandeg 2026-01-14 15:11:55 +0100
  • cc40ccc515 fix commands in readme, using new arg format svlandeg 2026-01-14 15:08:50 +0100
  • 8cfa0451f4 When eval language_modeling tasks, be case insensitive to answers askerlee 2026-01-14 15:47:36 +0800
  • e64aa82620 When evaluating language_modeling tasks, be case-insensitive when matching with the correct answer askerlee 2026-01-14 15:34:40 +0800
  • bf067e2a66 Add max_seq_len argument for gpt2 askerlee 2026-01-14 14:19:20 +0800
  • c9c01ffe04 fix: add Flash Attention 3 fallback for MPS/CPU inference hasan 2026-01-14 01:10:29 +0100
  • 7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way Andrej Karpathy 2026-01-13 22:45:27 +0000
  • 3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default Andrej Karpathy 2026-01-13 22:09:36 +0000
  • 37de80c3b8
    Merge 07e5509662 into f92efce169 suspicious-pineapple 2026-01-13 22:35:59 +0100
  • f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance Andrej Karpathy 2026-01-13 21:33:54 +0000
  • 43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training Andrej Karpathy 2026-01-13 20:05:47 +0000