• Joined on 2024-05-31
tacit synced and deleted reference refs/tags/refs/pull/428/merge at tacit/nanochat from mirror 2026-01-16 00:14:01 +00:00
tacit synced and deleted reference refs/tags/refs/pull/161/merge at tacit/nanochat from mirror 2026-01-16 00:14:01 +00:00
tacit synced commits to refs/pull/93/merge at tacit/nanochat from mirror 2026-01-15 16:03:50 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
Compare 7 commits »
tacit synced commits to refs/pull/59/merge at tacit/nanochat from mirror 2026-01-15 07:54:01 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
Compare 10 commits »
tacit synced commits to refs/pull/438/merge at tacit/nanochat from mirror 2026-01-15 07:54:00 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/437/merge at tacit/nanochat from mirror 2026-01-15 07:54:00 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/436/merge at tacit/nanochat from mirror 2026-01-15 07:53:59 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/434/merge at tacit/nanochat from mirror 2026-01-15 07:53:58 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/433/merge at tacit/nanochat from mirror 2026-01-15 07:53:57 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/400/merge at tacit/nanochat from mirror 2026-01-15 07:53:56 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/431/merge at tacit/nanochat from mirror 2026-01-15 07:53:56 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to refs/pull/370/merge at tacit/nanochat from mirror 2026-01-15 07:53:55 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 3 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-01-15 07:53:54 +00:00
6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit
3142ca1a28 minor helpful message
Compare 2 commits »
tacit synced commits to refs/pull/436/head at tacit/nanochat from mirror 2026-01-14 23:43:41 +00:00
97364273e2 feat: restrict FA3 loading to Hopper+ GPUs (SM90+) to fix crashes on consumer hardware
d7fccbab82 fix: enforce (B, H, T, D) layout for SDPA fallback to support CPU strictness
Compare 2 commits »
tacit synced commits to refs/pull/436/merge at tacit/nanochat from mirror 2026-01-14 23:43:41 +00:00
97364273e2 feat: restrict FA3 loading to Hopper+ GPUs (SM90+) to fix crashes on consumer hardware
d7fccbab82 fix: enforce (B, H, T, D) layout for SDPA fallback to support CPU strictness
Compare 3 commits »
tacit synced commits to refs/pull/311/merge at tacit/nanochat from mirror 2026-01-14 23:43:41 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 5 commits »
tacit synced commits to refs/pull/436/head at tacit/nanochat from mirror 2026-01-14 15:33:49 +00:00
68e66be05c fix: wrap FA3 import in try-except block to support both CUDA and MPS
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-01-14 15:33:49 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 8 commits »
tacit synced commits to refs/pull/436/merge at tacit/nanochat from mirror 2026-01-14 15:33:49 +00:00
68e66be05c fix: wrap FA3 import in try-except block to support both CUDA and MPS
Compare 2 commits »
tacit synced commits to refs/pull/433/merge at tacit/nanochat from mirror 2026-01-14 15:33:48 +00:00
7312ec9898 fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
3b50b77ed3 fix base_loss to report correct loss by switching the dataloader to the new default
f92efce169 add negative result about not allowing attention across BOS tokens. A lot more code complexity for basically no gain in performance
43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
Compare 5 commits »