Andrej Karpathy
|
9467d83cf2
|
fix memory leak bug in rust tokenizer ty @mitsuhiko
|
2025-10-19 23:54:31 +00:00 |
|
Andrej Karpathy
|
d6d86cbf4c
|
update readme with a link to the CPU|MPS branch
|
2025-10-16 22:03:39 +00:00 |
|
Andrej Karpathy
|
ccfe7915ac
|
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
|
2025-10-16 19:32:44 +00:00 |
|
Andrej Karpathy
|
4346536ab2
|
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
|
2025-10-16 01:28:37 +00:00 |
|
Andrej Karpathy
|
2846999b8f
|
allow user to click on their message to edit them. conversation after that point is wiped
|
2025-10-16 01:16:22 +00:00 |
|
Andrej Karpathy
|
92d52ecc92
|
add slash commands to webui
|
2025-10-16 01:09:53 +00:00 |
|
Andrej Karpathy
|
fae3aca951
|
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
|
2025-10-15 20:32:22 +00:00 |
|
Andrej Karpathy
|
4c3590c499
|
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
|
2025-10-15 20:29:54 +00:00 |
|
Andrej Karpathy
|
03fa673b7d
|
add basic logging to chat_web, which i think might be fun
|
2025-10-15 19:51:06 +00:00 |
|
Andrej Karpathy
|
52bfeea8bd
|
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
|
2025-10-15 19:42:54 +00:00 |
|
Andrej Karpathy
|
01fb290f53
|
allow multiple GPUs to do inference in a data parallel way
|
2025-10-15 19:12:19 +00:00 |
|
Andrej Karpathy
|
190d9515d0
|
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
|
2025-10-15 16:42:23 +00:00 |
|
Andrej Karpathy
|
b8076dd367
|
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
|
2025-10-15 16:35:04 +00:00 |
|
Andrej
|
67aaca98f5
|
export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
|
2025-10-14 16:01:28 -07:00 |
|
Zach Mueller
|
f0855cbcc7
|
Update speedrun.sh
|
2025-10-14 14:12:01 -04:00 |
|
Andrej
|
dd6ff9a1cc
|
fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
|
2025-10-13 14:38:34 -07:00 |
|
Mirza-Samad-Ahmed-Baig
|
afaa5b4c90
|
Fix: Handle missing d<number> model tags in find_largest_model
|
2025-10-14 00:24:07 +03:00 |
|
Andrej
|
5fd0b13886
|
Merge pull request #2 from epoyraz/patch-1
Update README.md
|
2025-10-13 10:10:15 -07:00 |
|
Enes Poyraz
|
6a795baf27
|
Update README.md
fix typos
|
2025-10-13 18:40:12 +02:00 |
|
Andrej
|
626bd3e260
|
Add image of the WebUI to readme
|
2025-10-13 08:03:00 -07:00 |
|
karpathy
|
da96b46565
|
update link to the new discussion
|
2025-10-13 07:42:09 -07:00 |
|
karpathy
|
a53833d04f
|
add nanochat logo png
|
2025-10-13 06:59:59 -07:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|