• Joined on 2024-05-31
tacit synced commits to refs/pull/510/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
26bc859fc7 Merge branch 'master' into fix/comment
Compare 2 commits »
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
Compare 3 commits »
tacit synced commits to refs/pull/447/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
f8c3b8ea56 Merge branch 'master' into 446-checkpoint-before-eval
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
Compare 53 commits »
tacit synced commits to refs/pull/510/head at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
26bc859fc7 Merge branch 'master' into fix/comment
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
Compare 3 commits »
tacit synced commits to refs/pull/429/head at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
c655043092 Merge branch 'master' into fix/shard_count
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 58 commits »
tacit synced commits to refs/pull/442/merge at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
4bd03a77be Merge branch 'master' into fix/data_cutoff
Compare 2 commits »
tacit synced commits to refs/pull/442/head at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
4bd03a77be Merge branch 'master' into fix/data_cutoff
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 53 commits »
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
b6899d5230 Merge branch 'master' into why2011btv-patch-2
Compare 2 commits »
tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 15 commits »
tacit synced commits to refs/pull/429/merge at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
c655043092 Merge branch 'master' into fix/shard_count
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
Compare 58 commits »
tacit synced commits to refs/pull/414/head at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
b6899d5230 Merge branch 'master' into why2011btv-patch-2
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 122 commits »
tacit synced commits to refs/pull/447/head at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
f8c3b8ea56 Merge branch 'master' into 446-checkpoint-before-eval
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 71 commits »
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 25 commits »
tacit synced and deleted reference refs/tags/refs/pull/399/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
tacit synced and deleted reference refs/tags/refs/pull/400/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
tacit synced and deleted reference refs/tags/refs/pull/370/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
tacit synced commits to refs/pull/204/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
6e571915d9 clarify docstring
c5ddd01b80 fix cpu script as well (after being moved on master)
ff599ce056 Merge branch 'master' into uv-venv
2f09686724 clarify that this is bf16 mfu we're talking about
Compare 76 commits »
tacit synced and deleted reference refs/tags/refs/pull/432/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
tacit synced commits to refs/pull/204/head at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00
6e571915d9 clarify docstring
c5ddd01b80 fix cpu script as well (after being moved on master)
ff599ce056 Merge branch 'master' into uv-venv
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
Compare 83 commits »
tacit synced and deleted reference refs/tags/refs/pull/59/merge at tacit/nanochat from mirror 2026-02-13 14:09:52 +00:00