nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-04-03 22:25:27 +00:00

Author	SHA1	Message	Date
Jason Kneen	b81d789992	Pass device batch size to base_loss script Added the --device_batch_size argument to the base_loss evaluation command in runmac_overnight.sh to ensure batch size is configurable during evaluation.	2025-10-22 09:29:46 +01:00
Jason Kneen	1225ddf00e	Add macOS memory-optimized training and documentation Introduces automatic memory detection and batch size optimization for Apple Silicon Macs in runcpu.sh and runmac_overnight.sh scripts. Adds a comprehensive README_MACOS.md with usage instructions, performance profiles, environment variable overrides, troubleshooting, and expected training times. Updates scripts to allow manual overrides and improve usability for various Mac configurations. Also switched python to arm64 for 2-3x improvement	2025-10-22 07:35:26 +01:00
Jason Kneen	5a3d8b6b5e	Update nanochat/gpt.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-22 02:37:32 +01:00
Jason Kneen	3e184d343e	Improve Mac/MPS compatibility and device handling Added dev/runmac_overnight.sh for optimized Mac training. Updated device-specific logic throughout dataloader, GPT, Muon optimizer, and training scripts to avoid CUDA-only features on MPS/CPU (e.g., torch.compile, pin_memory, non_blocking, bfloat16). Relaxed torch version constraints in pyproject.toml and removed Linux/CUDA-specific PyTorch config for better macOS support.	2025-10-22 01:55:38 +01:00
Andrej Karpathy	50bea28ef9	also add readme mention of the cpu mps changes	2025-10-21 17:24:48 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00
Andrej Karpathy	dfcb1c16f1	Merge branch 'master' into cpu-mps-dev	2025-10-21 17:15:53 +00:00
Andrej Karpathy	bb71c64579	fix silly issue in dataloader, this version is much faster and more portable to mps too	2025-10-21 17:12:50 +00:00
karpathy	bb786c5560	i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...	2025-10-21 10:07:40 -07:00
Andrej	c9ea7a91e2	Add customization instructions to README Added a section on customization for nanochat.	2025-10-21 08:57:10 -07:00
Andrej Karpathy	03cddd9878	actually let's not brick code on git pull. change error to warning	2025-10-21 15:13:25 +00:00
Andrej Karpathy	fe5aed940b	add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok	2025-10-21 15:04:58 +00:00
karpathy	2e9669e03a	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
Andrej	a09ac812ed	toml changes for cpu only install	2025-10-20 07:53:15 -07:00
burtenshaw	0abb0fa2e3	add both sides of the source check	2025-10-20 10:44:07 +02:00
burtenshaw	c7ae920a77	add check for linux on cpu	2025-10-20 06:51:52 +02:00
Andrej	0f007889dd	Add MIT License as a file to the project	2025-10-19 17:22:19 -07:00
Andrej	5a879f4947	export NANOCHAT_BASE_DIR so child processes get it too	2025-10-19 17:07:56 -07:00
Andrej Karpathy	c1d2ed1c13	use orig_model in sampling, silly of me to miss this	2025-10-20 00:05:09 +00:00
Andrej Karpathy	2bc521a6de	use orig_model in sampling, silly of me to miss this	2025-10-20 00:04:15 +00:00
Andrej Karpathy	9467d83cf2	fix memory leak bug in rust tokenizer ty @mitsuhiko	2025-10-19 23:54:31 +00:00
Tancrède Lepoint	b1443dc98c	export NANOCHAT_BASE_DIR so child processes get it too	2025-10-19 14:05:40 -04:00
Andrej	cf2baf9933	fix typo Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>	2025-10-17 08:35:41 -07:00
karpathy	e4f9b9c64d	revert to previous pyproject.toml	2025-10-17 08:08:16 -07:00
Andrej	e883b1d597	Merge pull request #99 from burtenshaw/cpu-mps-dev-ben Add mps and cpu dependency management	2025-10-17 07:24:38 -07:00
burtenshaw	23b6351c1c	add groups and source selection	2025-10-17 12:20:18 +02:00
karpathy	ae02650afe	update the midtraining script too	2025-10-16 16:33:17 -07:00
karpathy	df600b6ed5	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00
Andrej Karpathy	d6d86cbf4c	update readme with a link to the CPU\|MPS branch	2025-10-16 22:03:39 +00:00
Andrej Karpathy	ccfe7915ac	mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo	2025-10-16 19:32:44 +00:00
karpathy	786119d593	add autodetect of device and related stuff. getting weird warnings/errors still, so wip	2025-10-16 10:26:19 -07:00
karpathy	279b74312c	adjust comment/guidance on device type	2025-10-16 10:06:39 -07:00
karpathy	306bc380ab	add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights	2025-10-16 10:04:43 -07:00
Andrej Karpathy	722da4f543	trying to add basic cpu support, will try mps too	2025-10-16 16:14:38 +00:00
Andrej Karpathy	4346536ab2	also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate	2025-10-16 01:28:37 +00:00
Andrej Karpathy	2846999b8f	allow user to click on their message to edit them. conversation after that point is wiped	2025-10-16 01:16:22 +00:00
Andrej Karpathy	92d52ecc92	add slash commands to webui	2025-10-16 01:09:53 +00:00
Andrej Karpathy	fae3aca951	add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now	2025-10-15 20:32:22 +00:00
Andrej Karpathy	4c3590c499	fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints	2025-10-15 20:29:54 +00:00
Andrej Karpathy	03fa673b7d	add basic logging to chat_web, which i think might be fun	2025-10-15 19:51:06 +00:00
Andrej Karpathy	52bfeea8bd	add very basic abuse prevention limits to chat_web so it's ok to host endpoints	2025-10-15 19:42:54 +00:00
Andrej Karpathy	01fb290f53	allow multiple GPUs to do inference in a data parallel way	2025-10-15 19:12:19 +00:00
Andrej Karpathy	190d9515d0	dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports	2025-10-15 16:42:23 +00:00
Andrej Karpathy	b8076dd367	fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation	2025-10-15 16:35:04 +00:00
Andrej	67aaca98f5	export NANOCHAT_BASE_DIR so child processes get it too Export the cache directory so that users can use their own cache location	2025-10-14 16:01:28 -07:00
Zach Mueller	f0855cbcc7	Update speedrun.sh	2025-10-14 14:12:01 -04:00
Andrej	dd6ff9a1cc	fix bug in fallback case of find_largest_model Fix: Handle missing d<number> model tags in find_largest_model ty	2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig	afaa5b4c90	Fix: Handle missing d<number> model tags in find_largest_model	2025-10-14 00:24:07 +03:00
Andrej	5fd0b13886	Merge pull request #2 from epoyraz/patch-1 Update README.md	2025-10-13 10:10:15 -07:00
Enes Poyraz	6a795baf27	Update README.md fix typos	2025-10-13 18:40:12 +02:00

1 2

54 Commits