• Joined on 2024-05-31
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-15 01:13:15 +00:00
1b1cc3c599 submit new time to GPT-2 leaderboard entry: 99 minutes
a825e63f81 Autoresearch round 2: smear, backout, and hyperparameter tuning
Compare 3 commits »
tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-03-15 01:13:15 +00:00
1b1cc3c599 submit new time to GPT-2 leaderboard entry: 99 minutes
a825e63f81 Autoresearch round 2: smear, backout, and hyperparameter tuning
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 4 commits »
tacit synced and deleted reference refs/tags/refs/pull/642/merge at tacit/nanochat from mirror 2026-03-14 17:03:13 +00:00
tacit synced commits to refs/pull/635/head at tacit/nanochat from mirror 2026-03-14 00:43:13 +00:00
4468824a6e reduce eval problems for mmlu, humaneval, gsm8k, spellingbee, super slow
ef9718d2c7 typo
Compare 2 commits »
tacit synced and deleted reference refs/tags/refs/pull/635/merge at tacit/nanochat from mirror 2026-03-14 00:43:12 +00:00
tacit synced commits to refs/pull/595/merge at tacit/nanochat from mirror 2026-03-13 16:33:42 +00:00
6405b26d24 Merge branch 'master' into fix/typo
1052d25d45 we only need to wait 2h now!
Compare 3 commits »
tacit synced commits to refs/pull/595/head at tacit/nanochat from mirror 2026-03-13 16:33:42 +00:00
6405b26d24 Merge branch 'master' into fix/typo
1052d25d45 we only need to wait 2h now!
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 4 commits »
tacit synced commits to refs/pull/635/merge at tacit/nanochat from mirror 2026-03-13 08:23:12 +00:00
e9bf1a5a67 chat sft optim fix
Compare 2 commits »
tacit synced commits to refs/pull/633/merge at tacit/nanochat from mirror 2026-03-12 14:50:45 +00:00
8cadf88faf Update BACKLITE.md
Compare 2 commits »
tacit synced commits to refs/pull/633/head at tacit/nanochat from mirror 2026-03-12 14:50:44 +00:00
8cadf88faf Update BACKLITE.md
tacit synced commits to refs/pull/551/merge at tacit/nanochat from mirror 2026-03-11 19:55:33 +00:00
9dae23f487 Merge branch 'master' into bugfix/eval-sampler-crash
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced commits to refs/pull/579/head at tacit/nanochat from mirror 2026-03-11 19:55:33 +00:00
781b53078c Merge branch 'master' into fix-scaling-zero-division
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 2 commits »
tacit synced commits to refs/pull/579/merge at tacit/nanochat from mirror 2026-03-11 19:55:33 +00:00
781b53078c Merge branch 'master' into fix-scaling-zero-division
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced commits to refs/pull/540/head at tacit/nanochat from mirror 2026-03-11 19:55:32 +00:00
d343c939c0 fix for unsupported cuda
ff833f8137 small fix
5985cd867b use capability in the same way as on master
683b88c2b5 keep varunneal kernel on H100, use community kernel for other supported cuda architectures
a14c399576 Merge branch 'master' into fix/kernel
Compare 15 commits »
tacit synced commits to refs/pull/551/head at tacit/nanochat from mirror 2026-03-11 19:55:32 +00:00
9dae23f487 Merge branch 'master' into bugfix/eval-sampler-crash
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 2 commits »
tacit synced and deleted reference refs/tags/refs/pull/625/merge at tacit/nanochat from mirror 2026-03-11 19:55:31 +00:00
tacit synced commits to refs/pull/533/merge at tacit/nanochat from mirror 2026-03-11 19:55:31 +00:00
fc5a32a70e Merge branch 'master' into fix-batch-size-assertion
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »
tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-03-11 19:55:31 +00:00
fc5a32a70e Merge branch 'master' into fix-batch-size-assertion
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 2 commits »
tacit synced and deleted reference refs/tags/refs/pull/630/merge at tacit/nanochat from mirror 2026-03-11 11:45:32 +00:00
tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »