Commit Graph

18 Commits

Author SHA1 Message Date
dssjon
d366f7e07d Refactor constants in training scripts and engine to improve configurability. Replace hardcoded values with constants for KV cache growth, rotary cache multiplier, and learning rate parameters. This enhances maintainability and allows for easier adjustments in future iterations. 2025-10-15 14:58:06 -07:00
Andrej Karpathy
fae3aca951 add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now 2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints 2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d add basic logging to chat_web, which i think might be fun 2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints 2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh 2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz
6a795baf27
Update README.md
fix typos
2025-10-13 18:40:12 +02:00
Andrej
626bd3e260
Add image of the WebUI to readme 2025-10-13 08:03:00 -07:00
karpathy
da96b46565 update link to the new discussion 2025-10-13 07:42:09 -07:00
karpathy
a53833d04f add nanochat logo png 2025-10-13 06:59:59 -07:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00