tacit

tacit synced commits to refs/pull/536/head at tacit/nanochat from mirror 2026-02-16 16:40:26 +00:00

3735eb9723 simplify test.yml

7686d3c7e2 Update test.yml

3185f928d7 Update .github/workflows/test.yml

Compare 3 commits »

tacit synced commits to refs/pull/533/head at tacit/nanochat from mirror 2026-02-16 16:40:26 +00:00

240a60fec2 Add informative error message to batch size assertion

0f3b6a4654 Replace cryptic assertion with descriptive ValueError for batch size alignment

788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps

Compare 3 commits »

tacit synced commits to refs/pull/498/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00

c441615cdf Merge 330fa1188c into 788dadeb88

788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps

Compare 2 commits »

tacit synced commits to refs/pull/520/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00

ee89623f4b Merge 1bf1fdaa0d into 788dadeb88

788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps

Compare 2 commits »

tacit synced commits to refs/pull/516/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00

deedc0679b Merge 00932d1955 into 788dadeb88

788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps

Compare 2 commits »

tacit synced commits to refs/pull/526/merge at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00

ecf2423ccf Merge 4f79e750e7 into 788dadeb88

788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps

Compare 2 commits »

tacit synced commits to master at tacit/nanochat from mirror 2026-02-16 16:40:25 +00:00

788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps

tacit synced and deleted reference refs/tags/refs/pull/151/merge at tacit/nanochat from mirror 2026-02-16 16:40:24 +00:00

tacit synced and deleted reference refs/tags/refs/pull/512/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00

tacit synced and deleted reference refs/tags/refs/pull/477/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00

tacit synced and deleted reference refs/tags/refs/pull/492/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00

tacit synced and deleted reference refs/tags/refs/pull/515/merge at tacit/nanochat from mirror 2026-02-16 00:20:29 +00:00

tacit synced commits to refs/pull/455/head at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

28d5052b0e Merge branch 'master' into fix/cpu_report

2f09686724 clarify that this is bf16 mfu we're talking about

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

Compare 53 commits »

tacit synced commits to refs/pull/447/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

8810c3c68b Merge f8c3b8ea56 into 2f09686724

f8c3b8ea56 Merge branch 'master' into 446-checkpoint-before-eval

2f09686724 clarify that this is bf16 mfu we're talking about

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

Compare 53 commits »

tacit synced commits to refs/pull/455/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

915bb2cea5 Merge 28d5052b0e into 2f09686724

28d5052b0e Merge branch 'master' into fix/cpu_report

2f09686724 clarify that this is bf16 mfu we're talking about

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

Compare 40 commits »

tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

21a9511f42 Merge 181e7f1c15 into 2f09686724

2f09686724 clarify that this is bf16 mfu we're talking about

Compare 2 commits »

tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

c4e0fafa45 Merge 04862cbfea into 2f09686724

2f09686724 clarify that this is bf16 mfu we're talking about

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 3 commits »

tacit synced commits to refs/pull/510/head at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

26bc859fc7 Merge branch 'master' into fix/comment

2f09686724 clarify that this is bf16 mfu we're talking about

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 3 commits »

tacit synced commits to refs/pull/510/merge at tacit/nanochat from mirror 2026-02-13 14:09:54 +00:00

3db2124275 Merge 26bc859fc7 into 2f09686724

26bc859fc7 Merge branch 'master' into fix/comment

Compare 2 commits »

tacit synced commits to refs/pull/429/head at tacit/nanochat from mirror 2026-02-13 14:09:53 +00:00

c655043092 Merge branch 'master' into fix/shard_count

2f09686724 clarify that this is bf16 mfu we're talking about

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

Compare 58 commits »