tacit

tacit synced commits to refs/pull/431/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

4d90b137a0 Merge 1a9df65ee7 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

9ae863e85e Merge db5e62fc2a into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/370/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00

3639457bf6 Merge 9a9b12b1be into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00

4cbc3736d8 Merge 32ce342c88 into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/147/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00

087143e257 Merge bc3cf8b28e into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00

2c9e40e2e4 Merge 81a958350a into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/399/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00

291968a9a7 Merge de50af6283 into 238353c998

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 2 commits »

tacit synced commits to fp8_attempt_fail at tacit/nanochat from mirror 2026-01-13 23:13:45 +00:00

tacit synced new reference fp8_attempt_fail to tacit/nanochat from mirror 2026-01-13 23:13:45 +00:00

tacit synced commits to master at tacit/nanochat from mirror 2026-01-13 23:13:45 +00:00

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

Compare 7 commits »

tacit synced commits to refs/pull/147/merge at tacit/nanochat from mirror 2026-01-13 15:03:43 +00:00

246535c204 Merge bc3cf8b28e into 4610a838a1

bc3cf8b28e Merge branch 'master' into fix-mfu-a100

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 97 commits »

tacit synced commits to refs/pull/147/head at tacit/nanochat from mirror 2026-01-13 15:03:42 +00:00

bc3cf8b28e Merge branch 'master' into fix-mfu-a100

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

b33e394528 oops actually make SSSL the default window pattern

Compare 168 commits »

tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00

6c7311df6d Merge 7950813a41 into 4610a838a1

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 4 commits »

tacit synced commits to refs/pull/429/head at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00

48aaa4b3df Download the minimum number of parquet shards to train the tokenizer reproducibly

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

b33e394528 oops actually make SSSL the default window pattern

Compare 10 commits »

tacit synced commits to refs/pull/429/merge at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00

a8ada75361 Merge 48aaa4b3df into 4610a838a1

48aaa4b3df Download the minimum number of parquet shards to train the tokenizer reproducibly

Compare 2 commits »

tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00

de68430e19 Merge 04862cbfea into 4610a838a1

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 4 commits »

tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00

202201be40 Merge 489075bdbd into 4610a838a1

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 4 commits »

tacit synced commits to refs/pull/425/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00

fe620659c0 Merge eebab89a11 into 4610a838a1

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 4 commits »

tacit synced commits to refs/pull/407/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00

cd035bfb7a Merge 47885e743b into 4610a838a1

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 4 commits »

tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00

a01491492a Merge db5e62fc2a into 4610a838a1

4610a838a1 record negative result on MTP

21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway

aa95fb2e03 make miniseries more generic and easier to run and less hard coded

Compare 4 commits »