tacit

tacit synced commits to refs/pull/296/merge at tacit/nanochat from mirror 2026-02-01 16:19:51 +00:00

62fbea2977 Merge 5172ea11bb into 31b61d2d17

31b61d2d17 fix broken import sigh

4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479)

43078c347e clean up original tokenizing_distributed_data_loader (#478)

dc291c627f Add Blackwell (SM100) GPU support via SDPA fallback (#475)

Compare 26 commits »

tacit synced commits to refs/pull/480/merge at tacit/nanochat from mirror 2026-02-01 08:09:51 +00:00

007f3220a4 Merge b0ab30563e into 31b61d2d17

31b61d2d17 fix broken import sigh

4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479)

43078c347e clean up original tokenizing_distributed_data_loader (#478)

dc291c627f Add Blackwell (SM100) GPU support via SDPA fallback (#475)

Compare 6 commits »

tacit synced and deleted reference refs/tags/refs/pull/479/merge at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00

tacit synced commits to master at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00

31b61d2d17 fix broken import sigh

4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479)

43078c347e clean up original tokenizing_distributed_data_loader (#478)

dc291c627f Add Blackwell (SM100) GPU support via SDPA fallback (#475)

0307997f9b merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both

Compare 5 commits »

tacit synced commits to refs/pull/477/head at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00

c9f9cc928e Reverting config to existing

232d1341be Garbage collect after step 1 and freeze

Compare 2 commits »

tacit synced and deleted reference refs/tags/refs/pull/475/merge at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00

tacit synced and deleted reference refs/tags/refs/pull/478/merge at tacit/nanochat from mirror 2026-02-01 08:09:50 +00:00

tacit synced commits to refs/pull/478/merge at tacit/nanochat from mirror 2026-01-31 23:59:51 +00:00

445b431d7c Merge 941ce3cce8 into 1ddaad1c1c

1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

Compare 3 commits »

tacit synced commits to refs/pull/479/merge at tacit/nanochat from mirror 2026-01-31 23:59:51 +00:00

cb8200a16b Merge 61b7eae7e0 into 1ddaad1c1c

1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

Compare 3 commits »

tacit synced commits to refs/pull/477/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

b273607399 Merge 814475af42 into 348fbb301b

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

Compare 2 commits »

tacit synced commits to refs/pull/449/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

6346ed10ce Merge bfbe965790 into 1ddaad1c1c

1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

3c3a3d7042 warmdown of 0.5 is slightly better:

Compare 4 commits »

tacit synced commits to refs/pull/475/head at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

4e70a2b678 Update nanochat/flash_attention.py

tacit synced commits to refs/pull/475/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

8202283fea Merge 4e70a2b678 into 1ddaad1c1c

4e70a2b678 Update nanochat/flash_attention.py

1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

Compare 4 commits »

tacit synced commits to master at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

Compare 2 commits »

tacit synced commits to refs/pull/393/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

8622a3eb89 Merge 89d2741cba into 1ddaad1c1c

1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

3c3a3d7042 warmdown of 0.5 is slightly better:

4d8dbaf6e0 Fix escape character in README bibtex entry (#454)

Compare 10 commits »

tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-31 23:59:50 +00:00

e4d7efe5ff Merge 52f1a5ee5c into 348fbb301b

348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

3c3a3d7042 warmdown of 0.5 is slightly better:

4d8dbaf6e0 Fix escape character in README bibtex entry (#454)

3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)

Compare 9 commits »

tacit synced commits to refs/pull/59/merge at tacit/nanochat from mirror 2026-01-31 15:49:50 +00:00

0954a569b1 Merge 23393eae83 into 3c3a3d7042

3c3a3d7042 warmdown of 0.5 is slightly better:

4d8dbaf6e0 Fix escape character in README bibtex entry (#454)

3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)

ace6740bdd feat: allow top_k=0 in web api to disable filtering (#458)

Compare 11 commits »

tacit synced commits to refs/pull/455/merge at tacit/nanochat from mirror 2026-01-31 15:49:50 +00:00

542cd614e3 Merge ea86cef249 into 3c3a3d7042

3c3a3d7042 warmdown of 0.5 is slightly better:

Compare 2 commits »

tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-31 15:49:49 +00:00

7530c4a22b Merge dd9d4f051b into 3c3a3d7042

3c3a3d7042 warmdown of 0.5 is slightly better:

4d8dbaf6e0 Fix escape character in README bibtex entry (#454)

3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)

ace6740bdd feat: allow top_k=0 in web api to disable filtering (#458)

Compare 8 commits »

tacit synced commits to refs/pull/475/merge at tacit/nanochat from mirror 2026-01-31 07:39:50 +00:00

4a1c058039 Merge 2e45b7800a into 3c3a3d7042

3c3a3d7042 warmdown of 0.5 is slightly better:

Compare 2 commits »