• Joined on 2024-05-31
tacit synced and deleted reference refs/tags/refs/pull/17/merge at tacit/nanochat from mirror 2025-11-15 01:52:12 +00:00
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2025-11-14 17:42:15 +00:00
f66a780f68 Fix torch.dtype mismatching when running engine inline test.
4763ce612a Small fixes to typos
c6f5bd67db revert change of base to sft for quick inline test
a2fb3c83a6 fix typos
Compare 9 commits »
tacit synced commits to refs/pull/32/merge at tacit/nanochat from mirror 2025-11-14 17:42:14 +00:00
f66a780f68 Fix torch.dtype mismatching when running engine inline test.
4763ce612a Small fixes to typos
c6f5bd67db revert change of base to sft for quick inline test
a2fb3c83a6 fix typos
Compare 9 commits »
tacit synced commits to refs/pull/256/merge at tacit/nanochat from mirror 2025-11-14 17:42:14 +00:00
f66a780f68 Fix torch.dtype mismatching when running engine inline test.
4763ce612a Small fixes to typos
c6f5bd67db revert change of base to sft for quick inline test
a2fb3c83a6 fix typos
Compare 9 commits »
tacit synced commits to refs/pull/255/head at tacit/nanochat from mirror 2025-11-14 17:42:14 +00:00
c6f5bd67db revert change of base to sft for quick inline test
tacit synced commits to master at tacit/nanochat from mirror 2025-11-14 17:42:14 +00:00
f66a780f68 Fix torch.dtype mismatching when running engine inline test.
4763ce612a Small fixes to typos
c6f5bd67db revert change of base to sft for quick inline test
a2fb3c83a6 fix typos
e5efb4b471 add test_engine.py to file structure
Compare 8 commits »
tacit synced and deleted reference refs/tags/refs/pull/255/merge at tacit/nanochat from mirror 2025-11-14 17:42:14 +00:00
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2025-11-14 09:32:14 +00:00
7950813a41 make changes more minimal
Compare 2 commits »
tacit synced commits to refs/pull/93/head at tacit/nanochat from mirror 2025-11-14 09:32:14 +00:00
7950813a41 make changes more minimal
tacit synced commits to refs/pull/258/merge at tacit/nanochat from mirror 2025-11-14 09:32:14 +00:00
eecfdbf9f9 feat(dataset): make work share factor configurable with -f flag
ed07192724 feat(dataset): make work share factor configurable with -f flag
Compare 3 commits »
tacit synced commits to refs/pull/258/head at tacit/nanochat from mirror 2025-11-14 09:32:14 +00:00
eecfdbf9f9 feat(dataset): make work share factor configurable with -f flag
ed07192724 feat(dataset): make work share factor configurable with -f flag
Compare 2 commits »
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/59/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/53/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/40/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/3/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/275/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/258/merge at tacit/nanochat from mirror 2025-11-14 01:22:14 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/204/merge at tacit/nanochat from mirror 2025-11-14 01:22:13 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »
tacit synced commits to refs/pull/161/merge at tacit/nanochat from mirror 2025-11-14 01:22:13 +00:00
9a71d13688 typo oops
7b7fd0fe71 thank you Sophie for your help with nanochat
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
91f09ccd0d minor fix comment in engine
Compare 6 commits »