• Joined on 2024-05-31
tacit synced commits to master at tacit/nanochat from mirror 2026-02-07 02:59:54 +00:00
aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon
685271dc8d new optimal ratio for d26 training
Compare 2 commits »
tacit synced commits to refs/pull/483/merge at tacit/nanochat from mirror 2026-02-07 02:59:54 +00:00
aeff095e97 better comments/flow on all the hyperparameter transfer stuff, and change the WD scaling from my empirical 1/d^2 to a bit more principled version based on Tepoch. All of that theory is based on AdamW and could be suboptimal for Muon
685271dc8d new optimal ratio for d26 training
Compare 3 commits »
tacit synced and deleted reference refs/tags/refs/pull/508/merge at tacit/nanochat from mirror 2026-02-07 02:59:53 +00:00
tacit synced commits to refs/pull/489/merge at tacit/nanochat from mirror 2026-02-06 18:49:50 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/151/merge at tacit/nanochat from mirror 2026-02-06 18:49:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/485/merge at tacit/nanochat from mirror 2026-02-06 18:49:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/328/merge at tacit/nanochat from mirror 2026-02-06 18:49:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-02-06 18:49:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/370/merge at tacit/nanochat from mirror 2026-02-06 18:49:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 12 commits »
tacit synced commits to refs/pull/425/merge at tacit/nanochat from mirror 2026-02-06 18:49:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/486/merge at tacit/nanochat from mirror 2026-02-06 10:39:49 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 8 commits »
tacit synced commits to refs/pull/442/merge at tacit/nanochat from mirror 2026-02-06 10:39:48 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-02-06 10:39:48 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/501/merge at tacit/nanochat from mirror 2026-02-06 02:29:54 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 6 commits »
tacit synced commits to refs/pull/492/merge at tacit/nanochat from mirror 2026-02-06 02:29:53 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to refs/pull/483/merge at tacit/nanochat from mirror 2026-02-06 02:29:53 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
Compare 9 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-02-06 02:29:53 +00:00
e527521a3f briefly mention batch ramp experimentation too, too weak to merge in my few attempts
96522798f1 docs docs docs
5fdd5cdb24 new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier
2c062aaa94 nit: don't mutate args, create new var for total_batch_size
f41dd3cbd7 auto-calculate optimal batch size. the original setting of 0.5M was only optimal for d12, but d26 prefers 1M and so on
Compare 5 commits »
tacit synced and deleted reference refs/tags/refs/pull/93/merge at tacit/nanochat from mirror 2026-02-05 18:19:50 +00:00
tacit synced commits to master at tacit/nanochat from mirror 2026-02-05 18:19:50 +00:00
98eed6df18 bring back an assert guarding against bad param sizing
012da1a78b Typo fixes (#480)
75b302f331 fix hash commit on leaderboard and a paragraph clarification
Compare 3 commits »
tacit synced and deleted reference refs/tags/refs/pull/496/merge at tacit/nanochat from mirror 2026-02-05 18:19:50 +00:00