Commit Graph

  • f9fc48ca88
    Merge 5b27c0c59e into 1076f97059 Dustin Loring 2026-03-06 11:22:44 -0500
  • 5b27c0c59e Create convert_to_sharegpt.py Dustin Loring 2026-03-06 11:20:10 -0500
  • 65071f688c
    Merge branch 'master' into a2p2 Yoyo Liu 2026-03-06 10:30:40 -0500
  • 26f9fe62b9 readme update Yoyo 2026-03-06 10:20:30 -0500
  • c7b60251d0 ablation update Yoyo 2026-03-06 10:19:42 -0500
  • 7ad90aa746
    Merge 9caf6690a1 into 1076f97059 Daniel Dudek 2026-03-06 10:44:06 -0400
  • 14d50e4ef7
    Merge f8ff0439b9 into 1076f97059 Sofie Van Landeghem 2026-03-06 11:04:02 +0100
  • f8ff0439b9 two more small typos svlandeg 2026-03-06 11:03:00 +0100
  • f661297931
    Merge 967c408d3a into 1076f97059 Ralph 2026-03-06 15:07:06 +0800
  • 0c4d91d685
    Merge branch 'master' into diff_attn Tianyu Luo 2026-03-05 23:10:09 -0500
  • c6d44cf463 implement differential attention layers Tianyu Luo 2026-03-05 21:56:47 -0500
  • af9f842b83 A3 MrPOS666 2026-03-05 15:25:57 -0500
  • 4575e9534f
    Merge 5ba77a31b9 into 1076f97059 Kunwar Vikrant 2026-03-05 14:56:11 -0500
  • a5f788cb18 Use /usr/bin/env bash Mwaura Collins 2026-03-05 22:29:00 +0300
  • 21fd476a89
    Merge 3735eb9723 into 1076f97059 Sermet Pekin 2026-03-05 18:52:27 +0000
  • 96a6d7f280
    Merge 79b7b04ca0 into 1076f97059 fpvsim 2026-03-05 17:40:47 +0100
  • 4ac7562d3d modal script update Yoyo 2026-03-05 11:35:02 -0500
  • affab1a868
    Merge 212bdae120 into 1076f97059 Sofie Van Landeghem 2026-03-05 19:40:30 +0800
  • 667e34cdd1
    Merge ec06564b46 into 1076f97059 Pramod Dhungana 2026-03-05 19:40:24 +0800
  • 178bd4d58e
    Merge e009166646 into 1076f97059 Pramod Dhungana 2026-03-05 19:40:22 +0800
  • d7a98a4d91
    Merge e19b8b8fe1 into 1076f97059 Pramod Dhungana 2026-03-05 19:40:11 +0800
  • d511850f5f
    Merge b0778933ee into 1076f97059 Evgenii Zheltonozhskii 2026-03-05 12:15:23 +0200
  • eecf352d95
    Merge 767df6ef61 into 1076f97059 Junyang Chen 2026-03-05 10:51:50 +0100
  • b0778933ee Add upload to hf hub Evgenii Zheltonozhskii 2026-03-05 11:50:17 +0200
  • e9fb8db8c4
    Merge 16755495bc into 1076f97059 geopti 2026-03-05 10:29:53 +0100
  • 326e28d073
    Merge 5de185f9a7 into 1076f97059 Suraj-Self 2026-03-05 16:36:55 +1000
  • 5c7f572d7c
    Merge 28894e1262 into 1076f97059 Suraj-Self 2026-03-05 16:36:45 +1000
  • 81d3fccf87
    Merge 1bce71e03d into 1076f97059 Suraj-Self 2026-03-05 16:35:57 +1000
  • e6674bdba4 chore: 优化训练脚本 Liu Jiang 2026-03-05 12:58:27 +0800
  • 223e9b170e fix: 修复 chat_sft.py log 函数 section 参数传递错误问题 Liu Jiang 2026-03-05 12:46:30 +0800
  • c4d3727ba8 update: speedrundiy.sh 流程跑通 Liu Jiang 2026-03-05 12:36:07 +0800
  • 5de185f9a7 Merge branch 'master' into bugfix/eval-sampler-crash suraj-self 2026-03-05 08:43:47 +0530
  • 28894e1262 Merge branch 'master' into fix-batch-size-assertion suraj-self 2026-03-05 08:41:31 +0530
  • 841849cdb8 add ablations Yoyo 2026-03-04 22:07:55 -0500
  • 1bce71e03d Merge branch 'master' into fix-scaling-zero-division suraj-self 2026-03-05 08:37:48 +0530
  • 6d0afeacd3 ablation + anyscale Yoyo 2026-03-04 21:20:24 -0500
  • f7b71341fd readme update Yoyo 2026-03-04 20:51:41 -0500
  • 1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly master Andrej Karpathy 2026-03-04 23:55:24 +0000
  • 752abc836e
    Ensure that inputs and targets are contiguous (#569) Sofie Van Landeghem 2026-03-04 22:58:27 +0100
  • 898345a322
    Merge 2566b19e41 into 4b4077425b icenfly 2026-03-04 16:50:45 -0500
  • 676ddfdb46 revert miniseries.sh to original, this should run fine with G4 instance in colab Your Name 2026-03-04 13:26:21 -0800
  • c48aa05531 fallback to flex_attention when FA3 is not available Your Name 2026-03-04 12:21:13 -0800
  • 4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously Andrej Karpathy 2026-03-04 20:02:07 +0000
  • 324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise Andrej Karpathy 2026-03-04 19:47:12 +0000
  • 29b76c5695 get ride of version lock Your Name 2026-03-04 11:00:54 -0800
  • 0c5b9319aa adding unit tests, starting point of RTX 5090 optimization with flex_attention Your Name 2026-03-04 10:57:43 -0800
  • 91af7ac7e3
    Merge 330fa1188c into b07604ebaa Xingyu Dang 2026-03-03 18:33:27 -0300
  • ca5c5dd217
    Merge 005daea668 into b07604ebaa Emanuele 2026-03-03 18:33:24 -0300
  • a1e639b63e
    Merge 4f79e750e7 into b07604ebaa Unsal 2026-03-03 18:32:51 -0300
  • b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset Andrej Karpathy 2026-03-03 17:24:31 +0000
  • 719eaa7f37
    Merge 5422d3a132 into 83dccc20ae Andrej 2026-03-03 06:31:50 -0800
  • 4cdf5d35b5
    Merge 4f2f78a4a5 into 83dccc20ae Ralph 2026-03-03 10:44:18 +0200
  • fb464693ef
    Merge b6899d5230 into 83dccc20ae Haoyu Wang 2026-03-03 17:15:46 +0900
  • a5bbc67960 Merge branch 'master' into bugfix/eval-sampler-crash suraj-self 2026-03-03 11:27:29 +0530
  • 6e9ef8f565 Merge branch 'master' into fix-batch-size-assertion suraj-self 2026-03-03 11:25:58 +0530
  • be723b7afb Merge branch 'master' into fix-scaling-zero-division suraj-self 2026-03-03 11:22:31 +0530
  • 7839a19cec
    Merge 8cfa0451f4 into 83dccc20ae askerlee 2026-03-03 12:40:41 +0800
  • aba30cb037 tune logit softcap? Andrej Karpathy 2026-03-02 18:19:37 +0000
  • 83dccc20ae
    Restore completion-only loss masking in SFT dataloader (#582) Anish 2026-03-03 06:07:47 +0530
  • 96b6d64895 putting back the comment which got removed accidently, no functionality change gpu-poor 2026-03-03 00:21:16 +0530
  • 1d7a134b5c modify gitignore Yoyo 2026-03-01 14:56:26 -0500
  • e19b8b8fe1
    Merge branch 'karpathy:master' into fix/source-argparse-choices Pramod Dhungana 2026-03-01 09:04:16 -0500
  • 37d23102b6 fix(chat_cli,chat_eval): add --source choices and validate --task-name to avoid KeyError dhunganapramod9 2026-03-01 09:02:57 -0500
  • 20c385e8f7 fix(chat_web): use removeprefix for SSE chunk to avoid corrupting payload when token contains 'data: ' dhunganapramod9 2026-02-28 01:07:55 -0500
  • dd10cf915d fix(chat_web): use removeprefix for SSE chunk to avoid corrupting payload when token contains 'data: ' dhunganapramod9 2026-02-28 01:07:55 -0500
  • 24a74c4b7f fix(chat_web): add argparse choices for --source to avoid KeyError on invalid value dhunganapramod9 2026-02-28 01:07:26 -0500
  • 014084a656 undoing some changes gpu-poor 2026-03-01 13:16:18 +0530
  • 16755495bc fix(miniseries): extract tokens_trained from log instead of hardcoding batch size geopti 2026-02-28 20:43:34 +0000
  • fb2be07e17 fix: correct CSV extraction in scaling_laws.sh geopti 2026-02-28 16:37:04 +0000
  • 5a06a7c597 using the mask by render_conversation function of tokeniser gpu-poor 2026-02-28 16:36:52 +0530
  • 58465e3bf5 fix: guard target-param-data-ratio against zero to avoid ZeroDivisionError suraj-self 2026-02-28 13:10:50 +0530
  • 16a679c911
    Disable growth and make the failure explicit Dipesh Babu 2026-02-28 01:51:31 -0500
  • 043bd7fe14
    Merge 6e571915d9 into c7ba252142 kiankyars 2026-02-28 15:41:39 +0900
  • ec06564b46 fix(chat_web): validate --num-gpus against available GPUs for clear startup error dhunganapramod9 2026-02-28 01:08:55 -0500
  • e009166646 fix(tokenizer): add bounds check for system-only conversation to avoid IndexError dhunganapramod9 2026-02-28 01:08:20 -0500
  • d520349dbd fix(chat_web): use removeprefix for SSE chunk to avoid corrupting payload when token contains 'data: ' dhunganapramod9 2026-02-28 01:07:55 -0500
  • 8f7a940023 fix(chat_web): add argparse choices for --source to avoid KeyError on invalid value dhunganapramod9 2026-02-28 01:07:26 -0500
  • b66bbbf3de fix(chat_web): correct role validation error message to match allowed roles dhunganapramod9 2026-02-28 01:06:42 -0500
  • da507c5835 fix directly in data loader instead svlandeg 2026-02-27 02:01:34 +0100
  • 83de1b18b1 call reshape instead of view in case the tensors are not contiguous svlandeg 2026-02-27 01:50:37 +0100
  • 212bdae120 remove model tag svlandeg 2026-02-27 00:20:19 +0100
  • 0fe9acb8c0 use default tag svlandeg 2026-02-27 00:17:09 +0100
  • d11706e310
    Merge eebab89a11 into c7ba252142 lenkog 2026-02-26 22:00:20 +0000
  • f5af3ae07b
    Merge 9a9b12b1be into c7ba252142 Jason Cox 2026-02-26 22:00:08 +0000
  • 00a562cfe9 remove inherited parameters from commandline and replace by model-tag svlandeg 2026-02-26 22:44:50 +0100
  • 6ddd0602ed adding reply only loss for chat gpu-poor 2026-02-27 01:41:14 +0530
  • daf7ec9156 printing steps count gpu-poor 2026-02-26 10:07:09 +0000
  • 64faec45b1
    Merge 02b22a5a13 into c7ba252142 Anton Chechetka 2026-02-26 02:11:52 -0500
  • cf1900619c Nanoknow benchmark Lingwei Gu 2026-02-26 03:40:24 +0000
  • 2277da9ff4 adjust logic for downloading validation shard based on number of files Kartik Vashishta 2026-02-26 04:46:42 +1100
  • 3a0550ab48
    remove potentially confusing comment Sofie Van Landeghem 2026-02-25 18:28:18 +0100
  • 8603b68f5d Speedrun submission with nvidia climbmix, add the new dataset as an option Daniel Dudek 2026-02-25 14:44:55 +0100
  • d5c307ae67 fix: replace bare except clauses with except Exception haosenwang1018 2026-02-25 09:19:52 +0000
  • b661d41ffd
    Enhance rotary embedding cache management Dipesh Babu 2026-02-24 17:38:02 -0500
  • 2b55abe918
    Refactor rotary embedding cache management Dipesh Babu 2026-02-24 16:48:37 -0500
  • c546a44001
    restore original assert messages Sofie Van Landeghem 2026-02-24 16:51:34 +0100
  • e25e23a970
    Merge 28d5052b0e into c7ba252142 Sofie Van Landeghem 2026-02-23 14:36:49 -0500
  • 8f60cbb42b
    Merge 181e7f1c15 into c7ba252142 Kartik Vashishta 2026-02-22 18:32:11 -0800
  • b7629eff5d Add L3 (Large Lookup Layers) following arXiv:2601.21461v2 William Thurston 2026-02-22 15:49:15 -0800
  • 194c98a5b3 Merge upstream/master (266 commits) into fork William Thurston 2026-02-22 14:50:28 -0800