nanochat/nanochat
2026-01-17 12:27:30 -08:00
..
__init__.py initial commit 2025-10-13 06:49:24 -07:00
adamw.py fuse adamw into a single torch compiled kernel similar to muon. it's about 1.7X faster, but overall it's so tiny that it's not making a major dent 2026-01-15 23:30:44 +00:00
checkpoint_manager.py minor helpful message 2026-01-15 03:20:21 +00:00
common.py more GPU types from PR 147 thanks @Qubitium 2026-01-17 03:22:20 +00:00
core_eval.py initial commit 2025-10-13 06:49:24 -07:00
dataloader.py Reduce token waste in BOS bestfit by cropping shortest doc (#445) 2026-01-16 18:50:34 -08:00
dataset.py initial commit 2025-10-13 06:49:24 -07:00
engine.py update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption 2026-01-17 12:27:30 -08:00
execution.py nit delete redundant catch/raise in execute 2025-10-29 08:10:03 -07:00
flash_attention.py naturally i failed to include the actual code in the previous commit facepalm 2026-01-16 17:39:41 +00:00
gpt.py implement flash attention 3 fallback to pytorch sdpa by touching as few lines of code as possible in main files and keeping all implementation to a single file. add tests. add helpful warning messages for the user. 2026-01-16 17:37:51 +00:00
logo.svg initial commit 2025-10-13 06:49:24 -07:00
loss_eval.py fix typos 2025-11-14 11:20:25 +01:00
muon.py changes and optimizations to muon, making it more efficient and simpler/cleaner a bit 2026-01-15 03:20:48 +00:00
report.py fix small bug where this would break if git stage has deleted files 2026-01-04 19:11:43 +00:00
tokenizer.py adjust the comment on the regex pattern per recent experimnet see dev/LOG.md 2026-01-13 17:50:39 +00:00
ui.html Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348) 2025-12-31 13:03:22 -08:00