dssjon
|
d366f7e07d
|
Refactor constants in training scripts and engine to improve configurability. Replace hardcoded values with constants for KV cache growth, rotary cache multiplier, and learning rate parameters. This enhances maintainability and allows for easier adjustments in future iterations.
|
2025-10-15 14:58:06 -07:00 |
|
Andrej Karpathy
|
fae3aca951
|
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
|
2025-10-15 20:32:22 +00:00 |
|
Andrej Karpathy
|
4c3590c499
|
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
|
2025-10-15 20:29:54 +00:00 |
|
Andrej Karpathy
|
03fa673b7d
|
add basic logging to chat_web, which i think might be fun
|
2025-10-15 19:51:06 +00:00 |
|
Andrej Karpathy
|
52bfeea8bd
|
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
|
2025-10-15 19:42:54 +00:00 |
|
Andrej Karpathy
|
01fb290f53
|
allow multiple GPUs to do inference in a data parallel way
|
2025-10-15 19:12:19 +00:00 |
|
Andrej Karpathy
|
190d9515d0
|
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
|
2025-10-15 16:42:23 +00:00 |
|
Andrej Karpathy
|
b8076dd367
|
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
|
2025-10-15 16:35:04 +00:00 |
|
Andrej
|
67aaca98f5
|
export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
|
2025-10-14 16:01:28 -07:00 |
|
Zach Mueller
|
f0855cbcc7
|
Update speedrun.sh
|
2025-10-14 14:12:01 -04:00 |
|
Andrej
|
dd6ff9a1cc
|
fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
|
2025-10-13 14:38:34 -07:00 |
|
Mirza-Samad-Ahmed-Baig
|
afaa5b4c90
|
Fix: Handle missing d<number> model tags in find_largest_model
|
2025-10-14 00:24:07 +03:00 |
|
Andrej
|
5fd0b13886
|
Merge pull request #2 from epoyraz/patch-1
Update README.md
|
2025-10-13 10:10:15 -07:00 |
|
Enes Poyraz
|
6a795baf27
|
Update README.md
fix typos
|
2025-10-13 18:40:12 +02:00 |
|
Andrej
|
626bd3e260
|
Add image of the WebUI to readme
|
2025-10-13 08:03:00 -07:00 |
|
karpathy
|
da96b46565
|
update link to the new discussion
|
2025-10-13 07:42:09 -07:00 |
|
karpathy
|
a53833d04f
|
add nanochat logo png
|
2025-10-13 06:59:59 -07:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|