• Joined on 2024-05-31
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 08:50:35 +00:00
1bce71e03d Merge branch 'master' into fix-scaling-zero-division
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
Compare 3 commits »
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-05 08:50:34 +00:00
28894e1262 Merge branch 'master' into fix-batch-size-assertion
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
Compare 6 commits »
tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-05 08:50:34 +00:00
5de185f9a7 Merge branch 'master' into bugfix/eval-sampler-crash
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-05 08:50:31 +00:00
28894e1262 Merge branch 'master' into fix-batch-size-assertion
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 7 commits »
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-05 00:40:45 +00:00
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 4 commits »
tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-05 00:40:43 +00:00
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 4 commits »
tacit synced commits to refs/pull/555/merge at tacit/nanochat from mirror 2026-03-05 00:40:38 +00:00
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 3 commits »
tacit synced commits to refs/pull/544/merge at tacit/nanochat from mirror 2026-03-05 00:40:37 +00:00
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
Compare 5 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-03-05 00:40:35 +00:00
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise
Compare 4 commits »
tacit synced and deleted reference refs/tags/refs/pull/569/merge at tacit/nanochat from mirror 2026-03-05 00:40:34 +00:00
tacit synced and deleted reference refs/tags/refs/pull/563/merge at tacit/nanochat from mirror 2026-03-05 00:40:34 +00:00
tacit synced and deleted reference refs/tags/refs/pull/545/merge at tacit/nanochat from mirror 2026-03-05 00:40:32 +00:00
tacit synced commits to refs/pull/580/merge at tacit/nanochat from mirror 2026-03-04 16:30:30 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
Compare 3 commits »
tacit synced commits to refs/pull/576/merge at tacit/nanochat from mirror 2026-03-04 16:30:30 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)
Compare 4 commits »
tacit synced commits to refs/pull/545/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)
Compare 4 commits »
tacit synced commits to refs/pull/568/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)
Compare 4 commits »
tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)
Compare 4 commits »
tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-04 16:30:29 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
83dccc20ae Restore completion-only loss masking in SFT dataloader (#582)
Compare 4 commits »
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-04 16:30:28 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
Compare 3 commits »
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-04 08:20:28 +00:00
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset
aba30cb037 tune logit softcap?
Compare 3 commits »