• Joined on 2024-05-31
tacit synced commits to refs/pull/480/merge at tacit/nanochat from mirror 2026-02-01 08:09:51 +00:00
31b61d2d17 fix broken import sigh
4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479)
43078c347e clean up original tokenizing_distributed_data_loader (#478)
dc291c627f Add Blackwell (SM100) GPU support via SDPA fallback (#475)
Compare 6 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00
31b61d2d17 fix broken import sigh
4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479)
43078c347e clean up original tokenizing_distributed_data_loader (#478)
dc291c627f Add Blackwell (SM100) GPU support via SDPA fallback (#475)
0307997f9b merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both
Compare 5 commits »
tacit synced commits to refs/pull/477/head at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00
c9f9cc928e Reverting config to existing
232d1341be Garbage collect after step 1 and freeze
Compare 2 commits »
tacit synced and deleted reference refs/tags/refs/pull/478/merge at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00
tacit synced and deleted reference refs/tags/refs/pull/479/merge at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00
tacit synced and deleted reference refs/tags/refs/pull/475/merge at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00
tacit synced commits to refs/pull/478/merge at tacit/nanochat from mirror 2026-01-31 23:59:51 +00:00
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
Compare 3 commits »
tacit synced commits to refs/pull/479/merge at tacit/nanochat from mirror 2026-01-31 23:59:51 +00:00
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
Compare 3 commits »
tacit synced commits to refs/pull/477/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
Compare 2 commits »
tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
3c3a3d7042 warmdown of 0.5 is slightly better:
4d8dbaf6e0 Fix escape character in README bibtex entry (#454)
3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)
Compare 9 commits »
tacit synced commits to refs/pull/475/head at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
4e70a2b678 Update nanochat/flash_attention.py
tacit synced commits to master at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
Compare 2 commits »
tacit synced commits to refs/pull/475/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
4e70a2b678 Update nanochat/flash_attention.py
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
Compare 4 commits »
tacit synced commits to refs/pull/449/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
3c3a3d7042 warmdown of 0.5 is slightly better:
Compare 4 commits »
tacit synced commits to refs/pull/393/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining
3c3a3d7042 warmdown of 0.5 is slightly better:
4d8dbaf6e0 Fix escape character in README bibtex entry (#454)
Compare 10 commits »
tacit synced commits to refs/pull/59/merge at tacit/nanochat from mirror 2026-01-31 15:49:50 +00:00
3c3a3d7042 warmdown of 0.5 is slightly better:
4d8dbaf6e0 Fix escape character in README bibtex entry (#454)
3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)
ace6740bdd feat: allow top_k=0 in web api to disable filtering (#458)
Compare 11 commits »
tacit synced commits to refs/pull/455/merge at tacit/nanochat from mirror 2026-01-31 15:49:50 +00:00
3c3a3d7042 warmdown of 0.5 is slightly better:
Compare 2 commits »
tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-31 15:49:49 +00:00
3c3a3d7042 warmdown of 0.5 is slightly better:
4d8dbaf6e0 Fix escape character in README bibtex entry (#454)
3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)
ace6740bdd feat: allow top_k=0 in web api to disable filtering (#458)
Compare 8 commits »
tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2026-01-31 07:39:50 +00:00
3c3a3d7042 warmdown of 0.5 is slightly better:
4d8dbaf6e0 Fix escape character in README bibtex entry (#454)
3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)
ace6740bdd feat: allow top_k=0 in web api to disable filtering (#458)
Compare 8 commits »
tacit synced commits to refs/pull/475/merge at tacit/nanochat from mirror 2026-01-31 07:39:50 +00:00
3c3a3d7042 warmdown of 0.5 is slightly better:
Compare 2 commits »