Commit Graph

38 Commits

Author SHA1 Message Date
Dianababaei
0af8c8af68
Merge pull request #9 from Dianababaei/feat/enable-torch-compile-chat-sft-fixed-shapes
Update chat SFT training script configuration and parameters
2025-11-05 19:39:05 +03:30
Artemis Git Integration
5cd79225c4 feat(train): enable torch.compile for chat_sft with fixed shapes for 30-50% speedup 2025-11-05 16:07:54 +00:00
Dianababaei
072d49ab3c
Merge pull request #8 from Dianababaei/feat/chat-sft-fixed-length-padding-torch-compile
Add configurable max_seq_len parameter to sft_data_generator function
2025-11-05 19:36:36 +03:30
Artemis Git Integration
d8be015b20 feat(chat_sft): add fixed-length padding for torch.compile compatibility
Replace variable-length padding with fixed 2048-token padding to create constant batch shapes, enabling efficient torch.compile in subsequent training steps
2025-11-05 16:04:26 +00:00
Dianababaei
878d8bbdfa
Merge pull request #6 from Dianababaei/docs/update-generate-docstring-kv-cache-optimization
Update GPT class generate method signature in nanochat/gpt.py
2025-11-03 16:07:15 +03:30
Artemis Git Integration
15a782453f docs: update generate() docstring to reflect KV cache optimization 2025-11-03 12:30:21 +00:00
Dianababaei
3a3cd20690
Merge pull request #5 from Dianababaei/feat/kv-cache-benchmark-script
Add benchmark script for measuring optimization performance
2025-11-03 13:37:28 +03:30
Artemis Git Integration
4d9d10abb0 feat(benchmark): add performance benchmark script for KV-cache optimizations with CLI args, GPU memory tracking, and statistical measurement across iterations 2025-11-03 10:06:02 +00:00
Dianababaei
333919d764
Merge pull request #4 from Dianababaei/feat/kv-cached-generation-loop-o-t-optimization
refactor: Update GPT generate method and modify GPTConfig class parameters
2025-11-03 13:35:41 +03:30
Artemis Git Integration
b78bc3fd9f perf: optimize generation loop from O(T²) to O(T) using KV-cache
Refactor to process one token per iteration instead of reprocessing entire sequence. Reorder loop to sample → yield → forward with single token input, enabling fast path in attention (
2025-11-03 10:04:43 +00:00
Dianababaei
8927ec79c8
Merge pull request #3 from Dianababaei/feat/gpt-prefill-phase-kv-caching
Add 6 lines to GPT class to expand model capabilities and configuration
2025-11-03 13:33:16 +03:30
Artemis Git Integration
1131c37a62 feat(gpt): implement prefill phase for efficient prompt processing with KV-caching
Add prefill phase that processes entire prompt in single forward pass before generation loop, extracting logits only for last token position and populating KV-cache
2025-11-03 10:01:59 +00:00
Dianababaei
d0383978df
Merge pull request #2 from Dianababaei/feat/gpt-initialize-kvcache-efficient-generation
Add Rotary Position Embeddings (RoPE) support to GPT model with configurable flag
2025-11-03 13:30:46 +03:30
Artemis Git Integration
dd1f606c52 feat(gpt): initialize KVCache for efficient generation with MQA support
Add KVCache pre-allocation in generate() method to enable efficient key-value caching during token generation, avoiding dynamic reallocation overhead
2025-11-03 10:00:19 +00:00
Dianababaei
d44a3e090f
Merge pull request #1 from Dianababaei/feat/gpt-add-kvcache-import
Update GPT model configuration and initialization parameters in GPTConfig class
2025-11-03 13:26:17 +03:30
Artemis Git Integration
1703f181b9 feat(gpt): add KVCache import from engine module for efficient autoregressive generation 2025-11-03 09:55:48 +00:00
Andrej Karpathy
d6d86cbf4c update readme with a link to the CPU|MPS branch 2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo 2025-10-16 19:32:44 +00:00
Andrej Karpathy
4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f allow user to click on their message to edit them. conversation after that point is wiped 2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92 add slash commands to webui 2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951 add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now 2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints 2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d add basic logging to chat_web, which i think might be fun 2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints 2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh 2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz
6a795baf27
Update README.md
fix typos
2025-10-13 18:40:12 +02:00
Andrej
626bd3e260
Add image of the WebUI to readme 2025-10-13 08:03:00 -07:00
karpathy
da96b46565 update link to the new discussion 2025-10-13 07:42:09 -07:00
karpathy
a53833d04f add nanochat logo png 2025-10-13 06:59:59 -07:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00