• Joined on 2024-05-31
tacit synced commits to refs/pull/425/merge at tacit/nanochat from mirror 2026-01-14 15:33:47 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-01-14 15:33:46 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-14 15:33:45 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
Compare 4 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-14 15:33:45 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 5 commits »
tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-14 15:33:44 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 5 commits »
tacit synced commits to refs/pull/399/merge at tacit/nanochat from mirror 2026-01-14 15:33:43 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 7 commits »
tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2026-01-14 15:33:43 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/296/merge at tacit/nanochat from mirror 2026-01-14 15:33:42 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/151/merge at tacit/nanochat from mirror 2026-01-14 15:33:42 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/396/merge at tacit/nanochat from mirror 2026-01-14 07:23:46 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2026-01-14 07:23:45 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/433/merge at tacit/nanochat from mirror 2026-01-13 23:13:48 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2026-01-13 23:13:48 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/434/merge at tacit/nanochat from mirror 2026-01-13 23:13:48 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 5 commits »
tacit synced commits to refs/pull/431/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/429/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »
tacit synced commits to refs/pull/432/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
Compare 6 commits »
tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00
23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md
64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs
238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.
Compare 4 commits »