Abylay Ospan
50b236fbcc
Add PyTorch and CUDA memory profiling systems
...
Capture PyTorch execution traces and CUDA memory snapshots. Traces
display detailed CPU and CUDA activity, including individual CUDA kernel
calls.
CUDA memory snapshots visualize all memory allocations, helping diagnose
CUDA out-of-memory errors, investigate memory leaks, or understand GPU
memory usage for educational purposes.
Enable profiling with the --enable_profiling=True flag in speedrun.sh.
See PROFILING.md for documentation and example visualizations.
2025-10-18 12:50:19 +00:00
Andrej Karpathy
d6d86cbf4c
update readme with a link to the CPU|MPS branch
2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
2025-10-16 19:32:44 +00:00
Andrej Karpathy
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f
allow user to click on their message to edit them. conversation after that point is wiped
2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92
add slash commands to webui
2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d
add basic logging to chat_web, which i think might be fun
2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53
allow multiple GPUs to do inference in a data parallel way
2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation
2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
...
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh
2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
...
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90
Fix: Handle missing d<number> model tags in find_largest_model
2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
...
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz
6a795baf27
Update README.md
...
fix typos
2025-10-13 18:40:12 +02:00
Andrej
626bd3e260
Add image of the WebUI to readme
2025-10-13 08:03:00 -07:00
karpathy
da96b46565
update link to the new discussion
2025-10-13 07:42:09 -07:00
karpathy
a53833d04f
add nanochat logo png
2025-10-13 06:59:59 -07:00
karpathy
3a5e0bc50b
initial commit
2025-10-13 06:49:24 -07:00