• Joined on 2024-05-31
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-03-07 09:50:28 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced commits to refs/pull/141/merge at tacit/nanochat from mirror 2026-03-07 09:50:28 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 9 commits »
tacit synced commits to refs/pull/522/merge at tacit/nanochat from mirror 2026-03-07 01:40:31 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced commits to refs/pull/509/merge at tacit/nanochat from mirror 2026-03-06 17:30:45 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 5 commits »
tacit synced and deleted reference refs/tags/refs/pull/319/merge at tacit/nanochat from mirror 2026-03-06 17:30:45 +00:00
tacit synced and deleted reference refs/tags/refs/pull/296/merge at tacit/nanochat from mirror 2026-03-06 17:30:44 +00:00
tacit synced and deleted reference refs/tags/refs/pull/591/merge at tacit/nanochat from mirror 2026-03-06 09:20:27 +00:00
tacit synced commits to refs/pull/521/merge at tacit/nanochat from mirror 2026-03-06 09:20:27 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced and deleted reference refs/tags/refs/pull/429/merge at tacit/nanochat from mirror 2026-03-06 09:20:27 +00:00
tacit synced commits to refs/pull/511/merge at tacit/nanochat from mirror 2026-03-06 01:10:27 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced commits to refs/pull/536/merge at tacit/nanochat from mirror 2026-03-06 01:10:27 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 8 commits »
tacit synced commits to refs/pull/576/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 5 commits »
tacit synced commits to refs/pull/580/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 5 commits »
tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
Compare 2 commits »
tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 5 commits »
tacit synced commits to refs/pull/544/merge at tacit/nanochat from mirror 2026-03-05 17:00:31 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 3 commits »
tacit synced commits to refs/pull/489/merge at tacit/nanochat from mirror 2026-03-05 17:00:31 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 5 commits »
tacit synced commits to refs/pull/568/merge at tacit/nanochat from mirror 2026-03-05 17:00:31 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 5 commits »
tacit synced commits to refs/pull/551/merge at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00
5de185f9a7 Merge branch 'master' into bugfix/eval-sampler-crash
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
Compare 6 commits »
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00
1bce71e03d Merge branch 'master' into fix-scaling-zero-division
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
Compare 3 commits »