nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-20 04:59:08 +00:00

History

Aarushi Singh 2ef158b4a3 Merge `135f25efb9` into `ebd4d9bbf5`		2026-01-30 01:29:08 +05:30
..
base_eval.py	bugfix	2025-12-26 19:02:12 +08:00
base_loss.py	update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption	2026-01-17 12:27:30 -08:00
base_train.py	Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help	2026-01-29 00:52:08 +00:00
chat_cli.py	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
chat_eval.py	Fix args in readme (#438 )	2026-01-15 16:26:38 -08:00
chat_rl.py	Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help	2026-01-29 00:52:08 +00:00
chat_sft.py	Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help	2026-01-29 00:52:08 +00:00
chat_web.py	adding change to docstring	2026-01-23 18:34:13 +05:30
mid_train.py	Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help	2026-01-29 00:52:08 +00:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00