nanochat/nanochat
2025-11-02 14:16:43 +01:00
..
__init__.py initial commit 2025-10-13 06:49:24 -07:00
adamw.py fix: remove unnecessary tensor allocation in DistAdamW optimizer 2025-10-20 12:03:26 +03:00
checkpoint_manager.py revert formatting changes to facilitate review 2025-11-02 14:16:43 +01:00
common.py add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
configurator.py initial commit 2025-10-13 06:49:24 -07:00
core_eval.py initial commit 2025-10-13 06:49:24 -07:00
dataloader.py Fix Torch crash caused by pinning on CPU 2025-10-22 16:25:36 +00:00
dataset.py initial commit 2025-10-13 06:49:24 -07:00
engine.py update the kv_shape 2025-10-27 02:47:13 -07:00
execution.py nit delete redundant catch/raise in execute 2025-10-29 08:10:03 -07:00
gpt.py use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
logo.svg initial commit 2025-10-13 06:49:24 -07:00
loss_eval.py Merge pull request #35 from bhaskar0210s/master 2025-10-29 08:06:24 -07:00
muon.py initial commit 2025-10-13 06:49:24 -07:00
report.py many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
tokenizer.py allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
ui.html fix(ui): prevent iOS Safari toolbar from covering input on initial load 2025-10-21 17:34:40 -07:00