nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-19 04:29:09 +00:00

History

karpathy f9a7e0f111 update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption		2026-01-17 12:27:30 -08:00
..
__init__.py	initial commit	2025-10-13 06:49:24 -07:00
adamw.py	fuse adamw into a single torch compiled kernel similar to muon. it's about 1.7X faster, but overall it's so tiny that it's not making a major dent	2026-01-15 23:30:44 +00:00
checkpoint_manager.py	minor helpful message	2026-01-15 03:20:21 +00:00
common.py	more GPU types from PR 147 thanks @Qubitium	2026-01-17 03:22:20 +00:00
core_eval.py	initial commit	2025-10-13 06:49:24 -07:00
dataloader.py	Reduce token waste in BOS bestfit by cropping shortest doc (#445 )	2026-01-16 18:50:34 -08:00
dataset.py	initial commit	2025-10-13 06:49:24 -07:00
engine.py	update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption	2026-01-17 12:27:30 -08:00
execution.py	nit delete redundant catch/raise in execute	2025-10-29 08:10:03 -07:00
flash_attention.py	naturally i failed to include the actual code in the previous commit facepalm	2026-01-16 17:39:41 +00:00
gpt.py	implement flash attention 3 fallback to pytorch sdpa by touching as few lines of code as possible in main files and keeping all implementation to a single file. add tests. add helpful warning messages for the user.	2026-01-16 17:37:51 +00:00
logo.svg	initial commit	2025-10-13 06:49:24 -07:00
loss_eval.py	fix typos	2025-11-14 11:20:25 +01:00
muon.py	changes and optimizations to muon, making it more efficient and simpler/cleaner a bit	2026-01-15 03:20:48 +00:00
report.py	fix small bug where this would break if git stage has deleted files	2026-01-04 19:11:43 +00:00
tokenizer.py	adjust the comment on the regex pattern per recent experimnet see dev/LOG.md	2026-01-13 17:50:39 +00:00
ui.html	Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348 )	2025-12-31 13:03:22 -08:00