nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-04 16:30:28 +00:00

Author	SHA1	Message	Date
Artemis Git Integration	4d9d10abb0	feat(benchmark): add performance benchmark script for KV-cache optimizations with CLI args, GPU memory tracking, and statistical measurement across iterations	2025-11-03 10:06:02 +00:00
Dianababaei	333919d764	Merge pull request #4 from Dianababaei/feat/kv-cached-generation-loop-o-t-optimization refactor: Update GPT generate method and modify GPTConfig class parameters	2025-11-03 13:35:41 +03:30
Artemis Git Integration	b78bc3fd9f	perf: optimize generation loop from O(T²) to O(T) using KV-cache Refactor to process one token per iteration instead of reprocessing entire sequence. Reorder loop to sample → yield → forward with single token input, enabling fast path in attention (	2025-11-03 10:04:43 +00:00
Dianababaei	8927ec79c8	Merge pull request #3 from Dianababaei/feat/gpt-prefill-phase-kv-caching Add 6 lines to GPT class to expand model capabilities and configuration	2025-11-03 13:33:16 +03:30
Artemis Git Integration	1131c37a62	feat(gpt): implement prefill phase for efficient prompt processing with KV-caching Add prefill phase that processes entire prompt in single forward pass before generation loop, extracting logits only for last token position and populating KV-cache	2025-11-03 10:01:59 +00:00
Dianababaei	d0383978df	Merge pull request #2 from Dianababaei/feat/gpt-initialize-kvcache-efficient-generation Add Rotary Position Embeddings (RoPE) support to GPT model with configurable flag	2025-11-03 13:30:46 +03:30
Artemis Git Integration	dd1f606c52	feat(gpt): initialize KVCache for efficient generation with MQA support Add KVCache pre-allocation in generate() method to enable efficient key-value caching during token generation, avoiding dynamic reallocation overhead	2025-11-03 10:00:19 +00:00
Dianababaei	d44a3e090f	Merge pull request #1 from Dianababaei/feat/gpt-add-kvcache-import Update GPT model configuration and initialization parameters in GPTConfig class	2025-11-03 13:26:17 +03:30
Artemis Git Integration	1703f181b9	feat(gpt): add KVCache import from engine module for efficient autoregressive generation	2025-11-03 09:55:48 +00:00
Andrej Karpathy	d6d86cbf4c	update readme with a link to the CPU\|MPS branch	2025-10-16 22:03:39 +00:00
Andrej Karpathy	ccfe7915ac	mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo	2025-10-16 19:32:44 +00:00
Andrej Karpathy	4346536ab2	also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate	2025-10-16 01:28:37 +00:00
Andrej Karpathy	2846999b8f	allow user to click on their message to edit them. conversation after that point is wiped	2025-10-16 01:16:22 +00:00
Andrej Karpathy	92d52ecc92	add slash commands to webui	2025-10-16 01:09:53 +00:00
Andrej Karpathy	fae3aca951	add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now	2025-10-15 20:32:22 +00:00
Andrej Karpathy	4c3590c499	fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints	2025-10-15 20:29:54 +00:00
Andrej Karpathy	03fa673b7d	add basic logging to chat_web, which i think might be fun	2025-10-15 19:51:06 +00:00
Andrej Karpathy	52bfeea8bd	add very basic abuse prevention limits to chat_web so it's ok to host endpoints	2025-10-15 19:42:54 +00:00
Andrej Karpathy	01fb290f53	allow multiple GPUs to do inference in a data parallel way	2025-10-15 19:12:19 +00:00
Andrej Karpathy	190d9515d0	dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports	2025-10-15 16:42:23 +00:00
Andrej Karpathy	b8076dd367	fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation	2025-10-15 16:35:04 +00:00
Andrej	67aaca98f5	export NANOCHAT_BASE_DIR so child processes get it too Export the cache directory so that users can use their own cache location	2025-10-14 16:01:28 -07:00
Zach Mueller	f0855cbcc7	Update speedrun.sh	2025-10-14 14:12:01 -04:00
Andrej	dd6ff9a1cc	fix bug in fallback case of find_largest_model Fix: Handle missing d<number> model tags in find_largest_model ty	2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig	afaa5b4c90	Fix: Handle missing d<number> model tags in find_largest_model	2025-10-14 00:24:07 +03:00
Andrej	5fd0b13886	Merge pull request #2 from epoyraz/patch-1 Update README.md	2025-10-13 10:10:15 -07:00
Enes Poyraz	6a795baf27	Update README.md fix typos	2025-10-13 18:40:12 +02:00
Andrej	626bd3e260	Add image of the WebUI to readme	2025-10-13 08:03:00 -07:00
karpathy	da96b46565	update link to the new discussion	2025-10-13 07:42:09 -07:00
karpathy	a53833d04f	add nanochat logo png	2025-10-13 06:59:59 -07:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

31 Commits