• Joined on 2024-05-31
tacit synced commits to refs/pull/576/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/555/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 5 commits »
tacit synced commits to refs/pull/602/head at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
ed565be892 Add bounds checking to KVCache.advance() method
tacit synced commits to refs/pull/573/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/574/merge at tacit/nanochat from mirror 2026-03-11 03:35:36 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/511/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/536/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/521/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/509/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/455/merge at tacit/nanochat from mirror 2026-03-11 03:35:34 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-03-11 03:35:34 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/489/merge at tacit/nanochat from mirror 2026-03-11 03:35:34 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/442/merge at tacit/nanochat from mirror 2026-03-11 03:35:33 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/328/merge at tacit/nanochat from mirror 2026-03-11 03:35:33 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-03-11 03:35:33 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2026-03-11 03:35:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 11 commits »
tacit synced commits to refs/pull/141/merge at tacit/nanochat from mirror 2026-03-11 03:35:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-03-11 03:35:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-03-10 19:25:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/204/merge at tacit/nanochat from mirror 2026-03-10 19:25:31 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »