• Joined on 2024-05-31
tacit synced commits to refs/pull/511/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/521/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/509/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/536/merge at tacit/nanochat from mirror 2026-03-11 03:35:35 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/455/merge at tacit/nanochat from mirror 2026-03-11 03:35:34 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-03-11 03:35:34 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/489/merge at tacit/nanochat from mirror 2026-03-11 03:35:34 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/442/merge at tacit/nanochat from mirror 2026-03-11 03:35:33 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-03-11 03:35:33 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/328/merge at tacit/nanochat from mirror 2026-03-11 03:35:33 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/141/merge at tacit/nanochat from mirror 2026-03-11 03:35:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2026-03-11 03:35:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 11 commits »
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-03-11 03:35:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-03-10 19:25:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/568/merge at tacit/nanochat from mirror 2026-03-10 19:25:31 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/204/merge at tacit/nanochat from mirror 2026-03-10 19:25:31 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly
752abc836e Ensure that inputs and targets are contiguous (#569)
Compare 10 commits »
tacit synced commits to refs/pull/614/merge at tacit/nanochat from mirror 2026-03-10 11:15:32 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/544/merge at tacit/nanochat from mirror 2026-03-10 11:15:31 +00:00
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Compare 3 commits »
tacit synced commits to refs/pull/595/head at tacit/nanochat from mirror 2026-03-10 11:15:31 +00:00
d96558bcb0 fix heading, cf #622
tacit synced commits to refs/pull/595/merge at tacit/nanochat from mirror 2026-03-10 11:15:31 +00:00
d96558bcb0 fix heading, cf #622
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours
Compare 3 commits »