diana
a6efa53b92
optimisations fixed
2025-11-05 22:07:29 +03:30
Dianababaei
890d1af779
Merge pull request #19 from Dianababaei/test/auto-discovery-comprehensive-test-suite
...
Add automatic batch size discovery with comprehensive testing infrastructure for GPU memory optimization
2025-11-05 20:25:22 +03:30
Artemis Git Integration
ffdbb9c247
test: add comprehensive test suite for auto-batch-size discovery with unit and integration tests, pytest framework, stability validation, and updated documentation
2025-11-05 16:52:29 +00:00
Dianababaei
04e66eacfa
Merge pull request #18 from Dianababaei/feat/auto-batch-size-discovery-integration
...
Add batch sampling functionality for auto-discovery of optimal batch sizes across training scripts
2025-11-05 20:20:47 +03:30
Artemis Git Integration
09f5420fab
feat: add auto-batch-size discovery to base_train, mid_train, and chat_sft with fallback defaults and manual override support
2025-11-05 16:50:27 +00:00
Dianababaei
fa14cba28e
Merge pull request #17 from Dianababaei/feat/train-batch-sample-functions-memory-testing
...
Add batch sampling function factory for auto-discovery across training scripts
2025-11-05 20:19:35 +03:30
Artemis Git Integration
a8aad26041
feat(train): add batch sample functions for memory testing in auto-discovery
...
Add create_batch_sample_fn closures to base_train.py, mid_train.py, and chat_sft.py that generate realistic test batches matching training data formats for accurate memory
2025-11-05 16:48:55 +00:00
Dianababaei
38801c983d
Merge pull request #16 from Dianababaei/feat/auto-batch-size-discovery-config
...
Refactor training scripts: update base model training, chat SFT implementation, and intermediate stage parameters
2025-11-05 20:18:26 +03:30
Artemis Git Integration
cba76ef8ef
feat(config): add auto batch size discovery with configurable parameters and CLI overrides
...
Replace hardcoded device_batch_size with auto_batch_size, batch_size_margin, batch_size_cache, and device_batch_size variables across training scripts
2025-11-05 16:47:32 +00:00
Dianababaei
747f3a82ef
Merge pull request #7 from Dianababaei/feat/auto-batch-size-discovery
...
Add auto batch size optimization module with memory-aware batch size discovery
2025-11-05 20:04:42 +03:30
Dianababaei
9d525655e2
Merge pull request #15 from Dianababaei/test/comprehensive-sampling-edge-cases-73bf1317
...
Update nanochat engine module implementation
2025-11-05 20:03:39 +03:30
Artemis Git Integration
8c8f08955a
test: add comprehensive edge case test suite for sampling with deterministic and stochastic validation
2025-11-05 16:32:21 +00:00
Dianababaei
737165ce44
Merge pull request #14 from Dianababaei/refactor/engine-remove-token-broadcasting-first-iteration
...
refactor(engine): Remove 2 unnecessary lines from Engine class implementation
2025-11-05 20:01:54 +03:30
Artemis Git Integration
bacfe0f453
refactor(engine): remove token broadcasting in first iteration
...
Remove deprecated token broadcasting logic as prefill now generates num_samples independently sampled tokens after task #47 .
BREAKING CHANGE: Requires task #47 completion
2025-11-05 16:31:19 +00:00
Dianababaei
ad2f5c8c2f
Merge pull request #13 from Dianababaei/feat/engine-independent-token-sampling-prefill-multi-sample
...
Update Engine class implementation in nanochat/engine.py (lines 188-190)
2025-11-05 19:58:51 +03:30
Artemis Git Integration
eadcbc2d8f
feat(engine): enable independent token sampling in prefill for multi-sample generation
...
Repeat logits across batch dimension before sampling to generate distinct tokens per sample using torch.multinomial independently per row
2025-11-05 16:28:22 +00:00
Dianababaei
73bf1317ff
Merge pull request #12 from Dianababaei/test/engine-multi-sample-token-diversity-validation
...
Update chat engine implementation in nanochat/engine.py
2025-11-05 19:57:54 +03:30
Artemis Git Integration
c63107f51c
test(engine): add multi-sample token diversity validation test
...
Add test to verify multi-sample generation produces diverse tokens, validating broadcasting bug fix. Uses 10 samples with stochastic sampling to check 5-8 unique tokens are generated.
2025-11-05 16:27:02 +00:00
Dianababaei
717a2d443f
Merge pull request #11 from Dianababaei/test/torch-compile-validation-logging
...
Add validation reporting and update learning rate scheduler multiplier logic
2025-11-05 19:50:37 +03:30
Artemis Git Integration
47935c69d5
test: add torch.compile performance validation logging with multi-GPU compatibility checks
2025-11-05 16:19:59 +00:00
Dianababaei
49d29417f1
Merge pull request #10 from Dianababaei/refactor/chat-sft-use-orig-model-for-eval-and-checkpointing
...
Update chat supervised fine-tuning script with improved training configuration and enhanced pipeline functionality
2025-11-05 19:42:13 +03:30
Artemis Git Integration
a381fc406d
refactor(chat_sft): use uncompiled model for eval and checkpointing to prevent recompilation
...
Use orig_model instead of compiled model for engine init, MMLU/ARC-Easy eval, and checkpoint saving to avoid recompilation on variable-length inputs
2025-11-05 16:09:43 +00:00
Dianababaei
0af8c8af68
Merge pull request #9 from Dianababaei/feat/enable-torch-compile-chat-sft-fixed-shapes
...
Update chat SFT training script configuration and parameters
2025-11-05 19:39:05 +03:30
Artemis Git Integration
5cd79225c4
feat(train): enable torch.compile for chat_sft with fixed shapes for 30-50% speedup
2025-11-05 16:07:54 +00:00
Dianababaei
072d49ab3c
Merge pull request #8 from Dianababaei/feat/chat-sft-fixed-length-padding-torch-compile
...
Add configurable max_seq_len parameter to sft_data_generator function
2025-11-05 19:36:36 +03:30
Artemis Git Integration
d8be015b20
feat(chat_sft): add fixed-length padding for torch.compile compatibility
...
Replace variable-length padding with fixed 2048-token padding to create constant batch shapes, enabling efficient torch.compile in subsequent training steps
2025-11-05 16:04:26 +00:00
Artemis Git Integration
507b230565
feat(training): implement automatic batch size discovery module
...
Add core auto batch size discovery to eliminate manual tuning and maximize GPU utilization across 1-8 GPUs with exponential/binary search, DDP coordination, JSON caching, and OOM recove
2025-11-05 15:59:49 +00:00
Dianababaei
878d8bbdfa
Merge pull request #6 from Dianababaei/docs/update-generate-docstring-kv-cache-optimization
...
Update GPT class generate method signature in nanochat/gpt.py
2025-11-03 16:07:15 +03:30
Artemis Git Integration
15a782453f
docs: update generate() docstring to reflect KV cache optimization
2025-11-03 12:30:21 +00:00
Dianababaei
3a3cd20690
Merge pull request #5 from Dianababaei/feat/kv-cache-benchmark-script
...
Add benchmark script for measuring optimization performance
2025-11-03 13:37:28 +03:30
Artemis Git Integration
4d9d10abb0
feat(benchmark): add performance benchmark script for KV-cache optimizations with CLI args, GPU memory tracking, and statistical measurement across iterations
2025-11-03 10:06:02 +00:00
Dianababaei
333919d764
Merge pull request #4 from Dianababaei/feat/kv-cached-generation-loop-o-t-optimization
...
refactor: Update GPT generate method and modify GPTConfig class parameters
2025-11-03 13:35:41 +03:30
Artemis Git Integration
b78bc3fd9f
perf: optimize generation loop from O(T²) to O(T) using KV-cache
...
Refactor to process one token per iteration instead of reprocessing entire sequence. Reorder loop to sample → yield → forward with single token input, enabling fast path in attention (
2025-11-03 10:04:43 +00:00
Dianababaei
8927ec79c8
Merge pull request #3 from Dianababaei/feat/gpt-prefill-phase-kv-caching
...
Add 6 lines to GPT class to expand model capabilities and configuration
2025-11-03 13:33:16 +03:30
Artemis Git Integration
1131c37a62
feat(gpt): implement prefill phase for efficient prompt processing with KV-caching
...
Add prefill phase that processes entire prompt in single forward pass before generation loop, extracting logits only for last token position and populating KV-cache
2025-11-03 10:01:59 +00:00
Dianababaei
d0383978df
Merge pull request #2 from Dianababaei/feat/gpt-initialize-kvcache-efficient-generation
...
Add Rotary Position Embeddings (RoPE) support to GPT model with configurable flag
2025-11-03 13:30:46 +03:30
Artemis Git Integration
dd1f606c52
feat(gpt): initialize KVCache for efficient generation with MQA support
...
Add KVCache pre-allocation in generate() method to enable efficient key-value caching during token generation, avoiding dynamic reallocation overhead
2025-11-03 10:00:19 +00:00
Dianababaei
d44a3e090f
Merge pull request #1 from Dianababaei/feat/gpt-add-kvcache-import
...
Update GPT model configuration and initialization parameters in GPTConfig class
2025-11-03 13:26:17 +03:30
Artemis Git Integration
1703f181b9
feat(gpt): add KVCache import from engine module for efficient autoregressive generation
2025-11-03 09:55:48 +00:00
Andrej Karpathy
d6d86cbf4c
update readme with a link to the CPU|MPS branch
2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
2025-10-16 19:32:44 +00:00
Andrej Karpathy
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f
allow user to click on their message to edit them. conversation after that point is wiped
2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92
add slash commands to webui
2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d
add basic logging to chat_web, which i think might be fun
2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53
allow multiple GPUs to do inference in a data parallel way
2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
2025-10-15 16:42:23 +00:00