tacit

tacit synced commits to refs/pull/516/merge at tacit/nanochat from mirror 2026-02-10 20:49:59 +00:00

8abd890d6d Merge 00932d1955 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/513/merge at tacit/nanochat from mirror 2026-02-10 20:49:59 +00:00

254b2f6875 Merge 23acb17f17 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/515/merge at tacit/nanochat from mirror 2026-02-10 20:49:59 +00:00

dbd1e7e70b Merge 2ae28292aa into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/510/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

7a92c5936a Merge f999d06d58 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/509/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

870b6699d4 Merge 9caf6690a1 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/501/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

ef562d0725 Merge 005daea668 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/489/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

61cad03106 Merge 79b7b04ca0 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

Compare 4 commits »

tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

76cfcdd31c Merge 181e7f1c15 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/492/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

e7b08ffdd0 Merge 7ac837cff8 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-02-10 20:49:58 +00:00

6f40b699cf Merge 7c0eb3f00b into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/442/merge at tacit/nanochat from mirror 2026-02-10 20:49:57 +00:00

524d5d9570 Merge 3bf06802e6 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon

Compare 6 commits »

tacit synced commits to master at tacit/nanochat from mirror 2026-02-10 20:49:57 +00:00

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-02-10 20:49:57 +00:00

3122a6d130 Merge 8cfa0451f4 into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon

Compare 6 commits »

tacit synced commits to refs/pull/483/merge at tacit/nanochat from mirror 2026-02-10 20:49:57 +00:00

c04a9a4e24 Merge 56660c690b into e569b59f92

e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm

Compare 2 commits »

tacit synced commits to refs/pull/425/merge at tacit/nanochat from mirror 2026-02-10 04:29:51 +00:00

e7985a9d3f Merge eebab89a11 into 1ec0a34779

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon

685271dc8d new optimal ratio for d26 training

Compare 5 commits »

tacit synced commits to refs/pull/483/head at tacit/nanochat from mirror 2026-02-10 04:29:51 +00:00

56660c690b Merge branch 'karpathy:master' into master

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon

685271dc8d new optimal ratio for d26 training

Compare 16 commits »

tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-02-10 04:29:51 +00:00

7b57c3d692 Merge 04862cbfea into 1ec0a34779

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon

685271dc8d new optimal ratio for d26 training

Compare 16 commits »

tacit synced commits to refs/pull/483/merge at tacit/nanochat from mirror 2026-02-10 04:29:51 +00:00

1e1bb1267d Merge 56660c690b into 1ec0a34779

56660c690b Merge branch 'karpathy:master' into master

Compare 2 commits »

tacit synced commits to refs/pull/141/merge at tacit/nanochat from mirror 2026-02-10 04:29:50 +00:00

e3a0815c7d Merge 65865df300 into 1ec0a34779

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

Compare 3 commits »

tacit synced commits to refs/pull/510/head at tacit/nanochat from mirror 2026-02-09 20:19:56 +00:00

f999d06d58 Merge branch 'master' into fix/comment

1ec0a34779 at 28 and above we start to need batch size 8

ff46300720 tune miniseries just a bit, fairly cosmetic, keep to even depths where the math works out nicely in model sizing

aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon

685271dc8d new optimal ratio for d26 training

Compare 5 commits »