• Joined on 2024-05-31
tacit synced commits to refs/pull/595/merge at tacit/nanochat from mirror 2026-03-13 16:33:42 +00:00
6405b26d24 Merge branch 'master' into fix/typo
1052d25d45 we only need to wait 2h now!
Compare 3 commits »
tacit synced commits to refs/pull/595/head at tacit/nanochat from mirror 2026-03-13 16:33:42 +00:00
6405b26d24 Merge branch 'master' into fix/typo
1052d25d45 we only need to wait 2h now!
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 4 commits »
tacit synced commits to refs/pull/635/merge at tacit/nanochat from mirror 2026-03-13 08:23:12 +00:00
e9bf1a5a67 chat sft optim fix
Compare 2 commits »
tacit synced commits to refs/pull/633/merge at tacit/nanochat from mirror 2026-03-12 14:50:45 +00:00
8cadf88faf Update BACKLITE.md
Compare 2 commits »
tacit synced commits to refs/pull/633/head at tacit/nanochat from mirror 2026-03-12 14:50:44 +00:00
8cadf88faf Update BACKLITE.md
tacit synced commits to refs/pull/579/head at tacit/nanochat from mirror 2026-03-11 19:55:33 +00:00
781b53078c Merge branch 'master' into fix-scaling-zero-division
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 2 commits »
tacit synced commits to refs/pull/551/merge at tacit/nanochat from mirror 2026-03-11 19:55:33 +00:00
9dae23f487 Merge branch 'master' into bugfix/eval-sampler-crash
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-11 19:55:33 +00:00
781b53078c Merge branch 'master' into fix-scaling-zero-division
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-11 19:55:32 +00:00
9dae23f487 Merge branch 'master' into bugfix/eval-sampler-crash
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 2 commits »
tacit synced commits to refs/pull/540/head at tacit/nanochat from mirror 2026-03-11 19:55:32 +00:00
d343c939c0 fix for unsupported cuda
ff833f8137 small fix
5985cd867b use capability in the same way as on master
683b88c2b5 keep varunneal kernel on H100, use community kernel for other supported cuda architectures
a14c399576 Merge branch 'master' into fix/kernel
Compare 15 commits »
tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-11 19:55:31 +00:00
fc5a32a70e Merge branch 'master' into fix-batch-size-assertion
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 2 commits »
tacit synced and deleted reference refs/tags/refs/pull/625/merge at tacit/nanochat from mirror 2026-03-11 19:55:31 +00:00
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-11 19:55:31 +00:00
fc5a32a70e Merge branch 'master' into fix-batch-size-assertion
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced and deleted reference refs/tags/refs/pull/630/merge at tacit/nanochat from mirror 2026-03-11 11:45:32 +00:00
tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/602/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
ed565be892 Add bounds checking to KVCache.advance() method
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced commits to refs/pull/576/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/555/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 5 commits »
tacit synced commits to refs/pull/602/head at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
ed565be892 Add bounds checking to KVCache.advance() method