• Joined on 2024-05-31
tacit synced commits to refs/pull/431/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/370/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/147/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/399/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 2 commits »
tacit synced commits to fp8_attempt_fail at tacit/nanochat from mirror 2026-01-13 23:13:45 +00:00
tacit synced new reference fp8_attempt_fail to tacit/nanochat from mirror 2026-01-13 23:13:45 +00:00
tacit synced commits to master at tacit/nanochat from mirror 2026-01-13 23:13:45 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
Compare 7 commits »
tacit synced commits to refs/pull/147/merge at tacit/nanochat from mirror 2026-01-13 15:03:43 +00:00
bc3cf8b28e Merge branch 'master' into fix-mfu-a100
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 97 commits »
tacit synced commits to refs/pull/147/head at tacit/nanochat from mirror 2026-01-13 15:03:42 +00:00
bc3cf8b28e Merge branch 'master' into fix-mfu-a100
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
b33e394528 oops actually make SSSL the default window pattern
Compare 168 commits »
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 4 commits »
tacit synced commits to refs/pull/429/head at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00
48aaa4b3df Download the minimum number of parquet shards to train the tokenizer reproducibly
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
b33e394528 oops actually make SSSL the default window pattern
Compare 10 commits »
tacit synced commits to refs/pull/429/merge at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00
48aaa4b3df Download the minimum number of parquet shards to train the tokenizer reproducibly
Compare 2 commits »
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-01-12 22:43:45 +00:00
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 4 commits »
tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 4 commits »
tacit synced commits to refs/pull/425/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 4 commits »
tacit synced commits to refs/pull/407/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 4 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-12 22:43:44 +00:00
4610a838a1 record negative result on MTP
21608ec51e allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway
aa95fb2e03 make miniseries more generic and easier to run and less hard coded
Compare 4 commits »