tacit

tacit synced and deleted reference refs/tags/refs/pull/296/merge at tacit/nanochat from mirror 2026-03-06 17:30:44 +00:00

tacit synced and deleted reference refs/tags/refs/pull/429/merge at tacit/nanochat from mirror 2026-03-06 09:20:27 +00:00

tacit synced and deleted reference refs/tags/refs/pull/591/merge at tacit/nanochat from mirror 2026-03-06 09:20:27 +00:00

tacit synced commits to refs/pull/521/merge at tacit/nanochat from mirror 2026-03-06 09:20:27 +00:00

f661297931 Merge 967c408d3a into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/536/merge at tacit/nanochat from mirror 2026-03-06 01:10:27 +00:00

21fd476a89 Merge 3735eb9723 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 8 commits »

tacit synced commits to refs/pull/511/merge at tacit/nanochat from mirror 2026-03-06 01:10:27 +00:00

4575e9534f Merge 5ba77a31b9 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/576/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00

d7a98a4d91 Merge e19b8b8fe1 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 5 commits »

tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00

178bd4d58e Merge e009166646 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

Compare 2 commits »

tacit synced commits to refs/pull/580/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00

e9fb8db8c4 Merge 16755495bc into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 5 commits »

tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-05 17:00:32 +00:00

667e34cdd1 Merge ec06564b46 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 5 commits »

tacit synced commits to refs/pull/568/merge at tacit/nanochat from mirror 2026-03-05 17:00:31 +00:00

affab1a868 Merge 212bdae120 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 5 commits »

tacit synced commits to refs/pull/489/merge at tacit/nanochat from mirror 2026-03-05 17:00:31 +00:00

96a6d7f280 Merge 79b7b04ca0 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 5 commits »

tacit synced commits to refs/pull/544/merge at tacit/nanochat from mirror 2026-03-05 17:00:31 +00:00

eecf352d95 Merge 767df6ef61 into 1076f97059

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

Compare 3 commits »

tacit synced commits to refs/pull/551/merge at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00

326e28d073 Merge 5de185f9a7 into 1076f97059

5de185f9a7 Merge branch 'master' into bugfix/eval-sampler-crash

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

Compare 6 commits »

tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00

81d3fccf87 Merge 1bce71e03d into 1076f97059

1bce71e03d Merge branch 'master' into fix-scaling-zero-division

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

Compare 3 commits »

tacit synced commits to refs/pull/579/head at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00

1bce71e03d Merge branch 'master' into fix-scaling-zero-division

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-05 08:50:34 +00:00

5c7f572d7c Merge 28894e1262 into 1076f97059

28894e1262 Merge branch 'master' into fix-batch-size-assertion

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

Compare 6 commits »

tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-05 08:50:34 +00:00

5de185f9a7 Merge branch 'master' into bugfix/eval-sampler-crash

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-05 08:50:31 +00:00

28894e1262 Merge branch 'master' into fix-batch-size-assertion

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 00:40:45 +00:00

37acfbe6af Merge be723b7afb into 752abc836e

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 4 commits »