nanochat/nanochat
Andrej ad39db5a23
tiny fix to comment
Update engine.py with correct error message on assert
2025-11-01 07:43:57 -07:00
..
__init__.py initial commit 2025-10-13 06:49:24 -07:00
adamw.py fix: remove unnecessary tensor allocation in DistAdamW optimizer 2025-10-20 12:03:26 +03:00
checkpoint_manager.py Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
common.py add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
configurator.py initial commit 2025-10-13 06:49:24 -07:00
core_eval.py initial commit 2025-10-13 06:49:24 -07:00
dataloader.py Fix Torch crash caused by pinning on CPU 2025-10-22 16:25:36 +00:00
dataset.py initial commit 2025-10-13 06:49:24 -07:00
engine.py tiny fix to comment 2025-11-01 07:43:57 -07:00
execution.py nit delete redundant catch/raise in execute 2025-10-29 08:10:03 -07:00
gpt.py use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
logo.svg initial commit 2025-10-13 06:49:24 -07:00
loss_eval.py Merge pull request #35 from bhaskar0210s/master 2025-10-29 08:06:24 -07:00
muon.py initial commit 2025-10-13 06:49:24 -07:00
report.py many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
tokenizer.py allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
ui.html fix(ui): prevent iOS Safari toolbar from covering input on initial load 2025-10-21 17:34:40 -07:00