nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2025-12-07 12:52:16 +00:00

Author	SHA1	Message	Date
diana	a6efa53b92	optimisations fixed	2025-11-05 22:07:29 +03:30
Artemis Git Integration	09f5420fab	feat: add auto-batch-size discovery to base_train, mid_train, and chat_sft with fallback defaults and manual override support	2025-11-05 16:50:27 +00:00
Artemis Git Integration	a8aad26041	feat(train): add batch sample functions for memory testing in auto-discovery Add create_batch_sample_fn closures to base_train.py, mid_train.py, and chat_sft.py that generate realistic test batches matching training data formats for accurate memory	2025-11-05 16:48:55 +00:00
Artemis Git Integration	cba76ef8ef	feat(config): add auto batch size discovery with configurable parameters and CLI overrides Replace hardcoded device_batch_size with auto_batch_size, batch_size_margin, batch_size_cache, and device_batch_size variables across training scripts	2025-11-05 16:47:32 +00:00
Artemis Git Integration	47935c69d5	test: add torch.compile performance validation logging with multi-GPU compatibility checks	2025-11-05 16:19:59 +00:00
Artemis Git Integration	a381fc406d	refactor(chat_sft): use uncompiled model for eval and checkpointing to prevent recompilation Use orig_model instead of compiled model for engine init, MMLU/ARC-Easy eval, and checkpoint saving to avoid recompilation on variable-length inputs	2025-11-05 16:09:43 +00:00
Artemis Git Integration	5cd79225c4	feat(train): enable torch.compile for chat_sft with fixed shapes for 30-50% speedup	2025-11-05 16:07:54 +00:00
Artemis Git Integration	d8be015b20	feat(chat_sft): add fixed-length padding for torch.compile compatibility Replace variable-length padding with fixed 2048-token padding to create constant batch shapes, enabling efficient torch.compile in subsequent training steps	2025-11-05 16:04:26 +00:00
Andrej Karpathy	190d9515d0	dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports	2025-10-15 16:42:23 +00:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

10 Commits