• Joined on 2024-05-31
tacit synced commits to refs/pull/536/head at tacit/nanochat from mirror 2026-02-16 16:40:26 +00:00
3735eb9723 simplify test.yml
7686d3c7e2 Update test.yml
3185f928d7 Update .github/workflows/test.yml
Compare 3 commits »
tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-02-16 16:40:26 +00:00
240a60fec2 Add informative error message to batch size assertion
0f3b6a4654 Replace cryptic assertion with descriptive ValueError for batch size alignment
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
Compare 3 commits »
tacit synced commits to refs/pull/498/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
Compare 2 commits »
tacit synced commits to refs/pull/520/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
Compare 2 commits »
tacit synced commits to refs/pull/516/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
Compare 2 commits »
tacit synced commits to refs/pull/526/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
Compare 2 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps
tacit synced and deleted reference refs/tags/refs/pull/151/merge at tacit/nanochat from mirror 2026-02-16 16:40:24 +00:00
tacit synced and deleted reference refs/tags/refs/pull/512/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00
tacit synced and deleted reference refs/tags/refs/pull/477/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00
tacit synced and deleted reference refs/tags/refs/pull/492/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00
tacit synced and deleted reference refs/tags/refs/pull/515/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00
tacit synced commits to refs/pull/455/head at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
28d5052b0e Merge branch 'master' into fix/cpu_report
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 53 commits »
tacit synced commits to refs/pull/447/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
f8c3b8ea56 Merge branch 'master' into 446-checkpoint-before-eval
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
Compare 53 commits »
tacit synced commits to refs/pull/455/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
28d5052b0e Merge branch 'master' into fix/cpu_report
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
Compare 40 commits »
tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
2f09686724 clarify that this is bf16 mfu we're talking about
Compare 2 commits »
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
Compare 3 commits »
tacit synced commits to refs/pull/510/head at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
26bc859fc7 Merge branch 'master' into fix/comment
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
Compare 3 commits »
tacit synced commits to refs/pull/510/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00
26bc859fc7 Merge branch 'master' into fix/comment
Compare 2 commits »
tacit synced commits to refs/pull/429/head at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00
c655043092 Merge branch 'master' into fix/shard_count
2f09686724 clarify that this is bf16 mfu we're talking about
e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm
1ec0a34779 at 28 and above we start to need batch size 8
ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing
Compare 58 commits »