• Joined on 2024-05-31
tacit synced commits to refs/pull/204/merge at tacit/nanochat from mirror 2026-01-08 20:52:38 +00:00
a1ccb3dc0b remove rust compilation as rustbpe is now installed from separate package (#416)
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
Compare 25 commits »
tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2026-01-08 20:52:38 +00:00
f5a0ea4d3f take out these gitignore dirs
4ddc803797 fix adamw slight bug. this chunk was copy pasted originally from modded-nanogpt, which still seems to have the bug
a1ccb3dc0b remove rust compilation as rustbpe is now installed from separate package (#416)
Compare 4 commits »
tacit synced commits to master at tacit/nanochat from mirror 2026-01-08 20:52:37 +00:00
f5a0ea4d3f take out these gitignore dirs
4ddc803797 fix adamw slight bug. this chunk was copy pasted originally from modded-nanogpt, which still seems to have the bug
a1ccb3dc0b remove rust compilation as rustbpe is now installed from separate package (#416)
Compare 3 commits »
tacit synced and deleted reference refs/tags/refs/pull/416/merge at tacit/nanochat from mirror 2026-01-08 20:52:37 +00:00
tacit synced commits to refs/pull/85/merge at tacit/nanochat from mirror 2026-01-08 12:42:51 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-08 12:42:50 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 6 commits »
tacit synced commits to refs/pull/409/head at tacit/nanochat from mirror 2026-01-08 12:42:50 +00:00
489075bdbd Add support for ROCm backend in speedrun script
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 20 commits »
tacit synced commits to refs/pull/409/merge at tacit/nanochat from mirror 2026-01-08 12:42:50 +00:00
489075bdbd Add support for ROCm backend in speedrun script
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
Compare 17 commits »
tacit synced commits to refs/pull/405/merge at tacit/nanochat from mirror 2026-01-08 12:42:49 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/407/merge at tacit/nanochat from mirror 2026-01-08 12:42:49 +00:00
47885e743b Fix WANDB_RUN parameter passing in runcpu.sh
Compare 2 commits »
tacit synced commits to refs/pull/407/head at tacit/nanochat from mirror 2026-01-08 12:42:49 +00:00
47885e743b Fix WANDB_RUN parameter passing in runcpu.sh
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 20 commits »
tacit synced commits to refs/pull/370/merge at tacit/nanochat from mirror 2026-01-08 12:42:46 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/324/merge at tacit/nanochat from mirror 2026-01-08 12:42:45 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/59/merge at tacit/nanochat from mirror 2026-01-08 04:32:42 +00:00
1b5de29e71 Fix undefined variable in chat_rl after recent refactor
Compare 2 commits »
tacit synced commits to refs/pull/416/merge at tacit/nanochat from mirror 2026-01-08 04:32:41 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/412/merge at tacit/nanochat from mirror 2026-01-08 04:32:40 +00:00
1b5de29e71 Fix undefined variable in chat_rl after recent refactor
Compare 2 commits »
tacit synced commits to refs/pull/414/merge at tacit/nanochat from mirror 2026-01-08 04:32:40 +00:00
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
ccf4b7f9bf nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script
Compare 6 commits »
tacit synced commits to refs/pull/407/merge at tacit/nanochat from mirror 2026-01-08 04:32:39 +00:00
7e3a197c43 Merge 1e04f9846e44fd602ac2232db056fe95c891adb8 into 061f83c152
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/151/merge at tacit/nanochat from mirror 2026-01-08 04:32:35 +00:00
061f83c152 delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
e8c30c3b19 add notebook used for scaling laws analysis
3af4dcf6ee also add scaling_laws.sh script if it's a useful reference
4cc605b940 quick pointer to miniseries post in readme for now
Compare 7 commits »
tacit synced commits to refs/pull/396/merge at tacit/nanochat from mirror 2026-01-08 04:32:35 +00:00
1b5de29e71 Fix undefined variable in chat_rl after recent refactor
Compare 2 commits »