nanochat/scripts
Aarushi Singh ace6740bdd
feat: allow top_k=0 in web api to disable filtering (#458)
* allow top_k=0 in web api to disable filtering

* adding a comment for clear reasoning

* adding change to docstring
2026-01-30 09:21:41 -08:00
..
base_eval.py bugfix 2025-12-26 19:02:12 +08:00
base_loss.py update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption 2026-01-17 12:27:30 -08:00
base_train.py Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help 2026-01-29 00:52:08 +00:00
chat_cli.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
chat_eval.py Fix args in readme (#438) 2026-01-15 16:26:38 -08:00
chat_rl.py Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help 2026-01-29 00:52:08 +00:00
chat_sft.py Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help 2026-01-29 00:52:08 +00:00
chat_web.py feat: allow top_k=0 in web api to disable filtering (#458) 2026-01-30 09:21:41 -08:00
mid_train.py Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help 2026-01-29 00:52:08 +00:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00