• Joined on 2024-05-31
tacit synced and deleted reference refs/tags/refs/pull/90/merge at tacit/nanochat from mirror 2025-12-09 13:52:17 +00:00
tacit synced commits to refs/pull/151/merge at tacit/nanochat from mirror 2025-12-09 13:52:17 +00:00
d5759400f9 fixing two typos in comments
e72c3299df fix random.seed() footgun bug for SpellingBee data generation
7931e0903a rename checkpoint_dir to checkpoints_dir for consistency.
849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer
Compare 28 commits »
tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2025-12-09 05:42:24 +00:00
39cccc527f small bugfix make mid_train script work even with a tiny number of iterations
8b1cecaa95 Apply suggestion from @svlandeg for nicer looking comparison
58f3e84e01 clean up train/val loader in sft for consistency with mid/base
1b2a675c88 Improve KV cache code readability
Compare 20 commits »
tacit synced commits to refs/pull/348/merge at tacit/nanochat from mirror 2025-12-09 05:42:24 +00:00
7931e0903a rename checkpoint_dir to checkpoints_dir for consistency.
849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer
39cccc527f small bugfix make mid_train script work even with a tiny number of iterations
8b1cecaa95 Apply suggestion from @svlandeg for nicer looking comparison
Compare 24 commits »
tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2025-12-09 05:42:24 +00:00
d5759400f9 fixing two typos in comments
e72c3299df fix random.seed() footgun bug for SpellingBee data generation
7931e0903a rename checkpoint_dir to checkpoints_dir for consistency.
849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer
Compare 28 commits »
tacit synced and deleted reference refs/tags/refs/pull/351/merge at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
tacit synced and deleted reference refs/tags/refs/pull/358/merge at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
tacit synced commits to refs/pull/309/head at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
8b1cecaa95 Apply suggestion from @svlandeg for nicer looking comparison
tacit synced commits to refs/pull/256/merge at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
cbf30c842c apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.
90442de35f fix bug where any rank has to be able to create checkpoint_dir if saving optim
2fd0440355 fix: missing val_bpb on resume
16788eed3c fix(model): apply float32 cast before logits softcapping
Compare 6 commits »
tacit synced commits to master at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
d5759400f9 fixing two typos in comments
e72c3299df fix random.seed() footgun bug for SpellingBee data generation
7931e0903a rename checkpoint_dir to checkpoints_dir for consistency.
849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer
39cccc527f small bugfix make mid_train script work even with a tiny number of iterations
Compare 24 commits »
tacit synced commits to refs/pull/161/merge at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
d5759400f9 fixing two typos in comments
e72c3299df fix random.seed() footgun bug for SpellingBee data generation
7931e0903a rename checkpoint_dir to checkpoints_dir for consistency.
849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer
Compare 28 commits »
tacit synced and deleted reference refs/tags/refs/pull/361/merge at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2025-12-09 05:42:23 +00:00
cbf30c842c apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.
90442de35f fix bug where any rank has to be able to create checkpoint_dir if saving optim
2fd0440355 fix: missing val_bpb on resume
16788eed3c fix(model): apply float32 cast before logits softcapping
Compare 6 commits »
tacit synced and deleted reference refs/tags/refs/pull/326/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00
tacit synced and deleted reference refs/tags/refs/pull/327/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00
tacit synced and deleted reference refs/tags/refs/pull/342/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00
tacit synced and deleted reference refs/tags/refs/pull/325/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00
tacit synced and deleted reference refs/tags/refs/pull/317/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00
tacit synced and deleted reference refs/tags/refs/pull/345/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00
tacit synced and deleted reference refs/tags/refs/pull/306/merge at tacit/nanochat from mirror 2025-12-09 05:42:22 +00:00