nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-16 02:59:10 +00:00

Author	SHA1	Message	Date
dssjon	d366f7e07d	Refactor constants in training scripts and engine to improve configurability. Replace hardcoded values with constants for KV cache growth, rotary cache multiplier, and learning rate parameters. This enhances maintainability and allows for easier adjustments in future iterations.	2025-10-15 14:58:06 -07:00
Andrej Karpathy	fae3aca951	add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now	2025-10-15 20:32:22 +00:00
Andrej Karpathy	4c3590c499	fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints	2025-10-15 20:29:54 +00:00
Andrej Karpathy	03fa673b7d	add basic logging to chat_web, which i think might be fun	2025-10-15 19:51:06 +00:00
Andrej Karpathy	52bfeea8bd	add very basic abuse prevention limits to chat_web so it's ok to host endpoints	2025-10-15 19:42:54 +00:00
Andrej Karpathy	01fb290f53	allow multiple GPUs to do inference in a data parallel way	2025-10-15 19:12:19 +00:00
Andrej Karpathy	190d9515d0	dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports	2025-10-15 16:42:23 +00:00
Andrej Karpathy	b8076dd367	fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation	2025-10-15 16:35:04 +00:00
Andrej	67aaca98f5	export NANOCHAT_BASE_DIR so child processes get it too Export the cache directory so that users can use their own cache location	2025-10-14 16:01:28 -07:00
Zach Mueller	f0855cbcc7	Update speedrun.sh	2025-10-14 14:12:01 -04:00
Andrej	dd6ff9a1cc	fix bug in fallback case of find_largest_model Fix: Handle missing d<number> model tags in find_largest_model ty	2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig	afaa5b4c90	Fix: Handle missing d<number> model tags in find_largest_model	2025-10-14 00:24:07 +03:00
Andrej	5fd0b13886	Merge pull request #2 from epoyraz/patch-1 Update README.md	2025-10-13 10:10:15 -07:00
Enes Poyraz	6a795baf27	Update README.md fix typos	2025-10-13 18:40:12 +02:00
Andrej	626bd3e260	Add image of the WebUI to readme	2025-10-13 08:03:00 -07:00
karpathy	da96b46565	update link to the new discussion	2025-10-13 07:42:09 -07:00
karpathy	a53833d04f	add nanochat logo png	2025-10-13 06:59:59 -07:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

18 Commits