• Joined on 2024-05-31
tacit synced commits to refs/pull/588/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to refs/pull/579/head at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
e3bd5545b5 Merge branch 'master' into fix-scaling-zero-division
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
e5ebfa83a3 Merge branch 'master' into bugfix/eval-sampler-crash
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to refs/pull/551/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
e5ebfa83a3 Merge branch 'master' into bugfix/eval-sampler-crash
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
0e5403e7f6 Merge branch 'master' into fix-batch-size-assertion
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
e3bd5545b5 Merge branch 'master' into fix-scaling-zero-division
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/580/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00
0e5403e7f6 Merge branch 'master' into fix-batch-size-assertion
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 2 commits »
tacit synced commits to refs/pull/486/head at tacit/nanochat from mirror 2026-03-09 18:55:32 +00:00
67d63de2e6 resolve conflicts
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 57 commits »
tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-03-09 18:55:32 +00:00
67d63de2e6 resolve conflicts
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
Compare 9 commits »
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-03-09 18:55:31 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 9 commits »
tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-03-09 10:45:30 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-03-09 10:45:30 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 8 commits »
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-08 18:25:29 +00:00
f2899a1b4a Extend informative assertion message to chat_sft.py for consistency
Compare 2 commits »
tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-08 18:25:29 +00:00
f2899a1b4a Extend informative assertion message to chat_sft.py for consistency
tacit synced and deleted reference refs/tags/refs/pull/605/merge at tacit/nanochat from mirror 2026-03-08 18:25:29 +00:00
tacit synced and deleted reference refs/tags/refs/pull/603/merge at tacit/nanochat from mirror 2026-03-08 02:10:31 +00:00