tacit

tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00

81d3fccf87 Merge 1bce71e03d into 1076f97059

1bce71e03d Merge branch 'master' into fix-scaling-zero-division

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

Compare 3 commits »

tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-05 08:50:34 +00:00

5c7f572d7c Merge 28894e1262 into 1076f97059

28894e1262 Merge branch 'master' into fix-batch-size-assertion

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

Compare 6 commits »

tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-05 08:50:34 +00:00

5de185f9a7 Merge branch 'master' into bugfix/eval-sampler-crash

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-05 08:50:31 +00:00

28894e1262 Merge branch 'master' into fix-batch-size-assertion

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 7 commits »

tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 00:40:45 +00:00

37acfbe6af Merge be723b7afb into 752abc836e

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 4 commits »

tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-05 00:40:43 +00:00

565efcf858 Merge e009166646 into 752abc836e

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 4 commits »

tacit synced commits to refs/pull/555/merge at tacit/nanochat from mirror 2026-03-05 00:40:38 +00:00

898345a322 Merge 2566b19e41 into 4b4077425b

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 3 commits »

tacit synced commits to refs/pull/544/merge at tacit/nanochat from mirror 2026-03-05 00:40:37 +00:00

ffa10789eb Merge 767df6ef61 into 4b4077425b

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

Compare 5 commits »

tacit synced commits to master at tacit/nanochat from mirror 2026-03-05 00:40:35 +00:00

1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly

752abc836e Ensure that inputs and targets are contiguous (#569)

4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously

324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise

Compare 4 commits »

tacit synced and deleted reference refs/tags/refs/pull/569/merge at tacit/nanochat from mirror 2026-03-05 00:40:34 +00:00

tacit synced and deleted reference refs/tags/refs/pull/563/merge at tacit/nanochat from mirror 2026-03-05 00:40:34 +00:00

tacit synced and deleted reference refs/tags/refs/pull/545/merge at tacit/nanochat from mirror 2026-03-05 00:40:32 +00:00

tacit synced commits to refs/pull/580/merge at tacit/nanochat from mirror 2026-03-04 16:30:30 +00:00

fcb1bc3d11 Merge 16755495bc into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

Compare 3 commits »

tacit synced commits to refs/pull/576/merge at tacit/nanochat from mirror 2026-03-04 16:30:30 +00:00

e2d5f2b7af Merge e19b8b8fe1 into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)

Compare 4 commits »

tacit synced commits to refs/pull/545/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00

6ca67b654a Merge 2277da9ff4 into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)

Compare 4 commits »

tacit synced commits to refs/pull/568/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00

d7c0a91720 Merge 212bdae120 into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)

Compare 4 commits »

tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00

2731a4e78f Merge ec06564b46 into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)

Compare 4 commits »

tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00

4623f5e6b6 Merge e009166646 into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)

Compare 4 commits »

tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-04 16:30:28 +00:00

17337afc94 Merge 6e9ef8f565 into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

Compare 3 commits »

tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-04 08:20:28 +00:00

e6363b157b Merge be723b7afb into b07604ebaa

b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset

aba30cb037 tune logit softcap?

Compare 3 commits »