tacit

tacit synced commits to refs/pull/425/merge at tacit/nanochat from mirror 2026-01-14 15:33:47 +00:00

c5ecad0f06 Merge eebab89a11 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-01-14 15:33:46 +00:00

909d5b30c8 Merge 57bcf6786e into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-14 15:33:45 +00:00

4bd609f990 Merge 489075bdbd into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

Compare 4 commits »

tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-14 15:33:45 +00:00

4bfd8e98bf Merge db5e62fc2a into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 5 commits »

tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-14 15:33:44 +00:00

73ec1dfa54 Merge 32ce342c88 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 5 commits »

tacit synced commits to refs/pull/399/merge at tacit/nanochat from mirror 2026-01-14 15:33:43 +00:00

b2f43e29de Merge de50af6283 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 7 commits »

tacit synced commits to refs/pull/312/merge at tacit/nanochat from mirror 2026-01-14 15:33:43 +00:00

fbd7767106 Merge 26b0941f75 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/296/merge at tacit/nanochat from mirror 2026-01-14 15:33:42 +00:00

0c4a69bbe8 Merge 5172ea11bb into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/151/merge at tacit/nanochat from mirror 2026-01-14 15:33:42 +00:00

5cd1b0d0d2 Merge 7f3154f025 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/396/merge at tacit/nanochat from mirror 2026-01-14 07:23:46 +00:00

f7d1d38461 Merge 7f6219e092 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2026-01-14 07:23:45 +00:00

e6f151f751 Merge e00c73322c into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/433/merge at tacit/nanochat from mirror 2026-01-13 23:13:48 +00:00

93925391d1 Merge c0618a6b7e into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2026-01-13 23:13:48 +00:00

43ac6657a7 Merge 7950813a41 into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/434/merge at tacit/nanochat from mirror 2026-01-13 23:13:48 +00:00

d1ac940dd6 Merge 16d691d5f1 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

0ee7fbacd5 Merge 489075bdbd into 43c29dd9d5

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 5 commits »

tacit synced commits to refs/pull/431/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

4d90b137a0 Merge 1a9df65ee7 into 7312ec9898

7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way

3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

Compare 8 commits »

tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

9ae863e85e Merge db5e62fc2a into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/429/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

99adeff20c Merge 48aaa4b3df into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »

tacit synced commits to refs/pull/432/merge at tacit/nanochat from mirror 2026-01-13 23:13:47 +00:00

37de80c3b8 Merge 07e5509662 into f92efce169

f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance

43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

Compare 6 commits »

tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-13 23:13:46 +00:00

4cbc3736d8 Merge 32ce342c88 into 23985413aa

23985413aa adjust the comment on the regex pattern per recent experimnet see dev/LOG.md

64b48d0e5c validated that \p{N}{1,2} is the correct number of digits to group up to in the regex pattern of the GPT-4 tokenizer (2 down from 3), leading to the best val_bpb for 32K vocabs

238353c998 document my struggle with fp8 integration yesterday, it's not working like i thought it would and i suffered. one day i will return to continue the fight.

Compare 4 commits »