nanochat/nanochat
2025-10-27 02:47:13 -07:00
..
__init__.py initial commit 2025-10-13 06:49:24 -07:00
adamw.py fix: remove unnecessary tensor allocation in DistAdamW optimizer 2025-10-20 12:03:26 +03:00
checkpoint_manager.py Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
common.py add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
configurator.py initial commit 2025-10-13 06:49:24 -07:00
core_eval.py initial commit 2025-10-13 06:49:24 -07:00
dataloader.py Fix Torch crash caused by pinning on CPU 2025-10-22 16:25:36 +00:00
dataset.py initial commit 2025-10-13 06:49:24 -07:00
engine.py update the kv_shape 2025-10-27 02:47:13 -07:00
execution.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
gpt.py use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
logo.svg initial commit 2025-10-13 06:49:24 -07:00
loss_eval.py add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
muon.py initial commit 2025-10-13 06:49:24 -07:00
report.py many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
tokenizer.py allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
ui.html fix(ui): prevent iOS Safari toolbar from covering input on initial load 2025-10-21 17:34:40 -07:00