tacit

tacit synced commits to refs/pull/588/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

11a3c50bb0 Merge b0778933ee into 6ed7d1d82c

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to refs/pull/579/head at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

e3bd5545b5 Merge branch 'master' into fix-scaling-zero-division

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

e5ebfa83a3 Merge branch 'master' into bugfix/eval-sampler-crash

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to refs/pull/551/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

1769ef50b8 Merge e5ebfa83a3 into 6ed7d1d82c

e5ebfa83a3 Merge branch 'master' into bugfix/eval-sampler-crash

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 3 commits »

tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

b497fce754 Merge 0e5403e7f6 into 6ed7d1d82c

0e5403e7f6 Merge branch 'master' into fix-batch-size-assertion

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 3 commits »

tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

4a91881f47 Merge e3bd5545b5 into 6ed7d1d82c

e3bd5545b5 Merge branch 'master' into fix-scaling-zero-division

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 3 commits »

tacit synced commits to refs/pull/580/merge at tacit/nanochat from mirror 2026-03-10 03:05:31 +00:00

342a5c480a Merge 16755495bc into 6ed7d1d82c

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to master at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00

0a3aa1bf46 Merge b6899d5230 into 6ed7d1d82c

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00

0e5403e7f6 Merge branch 'master' into fix-batch-size-assertion

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-03-10 03:05:30 +00:00

61a1a79212 Merge 67d63de2e6 into 6ed7d1d82c

6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

Compare 2 commits »

tacit synced commits to refs/pull/486/head at tacit/nanochat from mirror 2026-03-09 18:55:32 +00:00

67d63de2e6 resolve conflicts

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 57 commits »

tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-03-09 18:55:32 +00:00

a7e92be0b3 Merge 67d63de2e6 into 1076f97059

67d63de2e6 resolve conflicts

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

Compare 9 commits »

tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-03-09 18:55:31 +00:00

8e44d5f2ed Merge 81a958350a into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 9 commits »

tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-03-09 10:45:30 +00:00

4ae08b0fe1 Merge 8cfa0451f4 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-03-09 10:45:30 +00:00

5ba533a934 Merge 181e7f1c15 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 8 commits »

tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-08 18:25:29 +00:00

726dba3874 Merge f2899a1b4a into 1076f97059

f2899a1b4a Extend informative assertion message to chat_sft.py for consistency

Compare 2 commits »

tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-08 18:25:29 +00:00

f2899a1b4a Extend informative assertion message to chat_sft.py for consistency

tacit synced and deleted reference refs/tags/refs/pull/605/merge at tacit/nanochat from mirror 2026-03-08 18:25:29 +00:00

tacit synced and deleted reference refs/tags/refs/pull/603/merge at tacit/nanochat from mirror 2026-03-08 02:10:31 +00:00