nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-16 02:59:10 +00:00

Author	SHA1	Message	Date
Lawrence R Kincheloe III	1bbba0f0d3	Merge branch 'master' into rocm-support	2025-10-16 18:47:37 -05:00
Andrej Karpathy	d6d86cbf4c	update readme with a link to the CPU\|MPS branch	2025-10-16 22:03:39 +00:00
Andrej Karpathy	ccfe7915ac	mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo	2025-10-16 19:32:44 +00:00
Andrej Karpathy	4346536ab2	also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate	2025-10-16 01:28:37 +00:00
Andrej Karpathy	2846999b8f	allow user to click on their message to edit them. conversation after that point is wiped	2025-10-16 01:16:22 +00:00
Andrej Karpathy	92d52ecc92	add slash commands to webui	2025-10-16 01:09:53 +00:00
Andrej Karpathy	fae3aca951	add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now	2025-10-15 20:32:22 +00:00
Andrej Karpathy	4c3590c499	fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints	2025-10-15 20:29:54 +00:00
Andrej Karpathy	03fa673b7d	add basic logging to chat_web, which i think might be fun	2025-10-15 19:51:06 +00:00
Andrej Karpathy	52bfeea8bd	add very basic abuse prevention limits to chat_web so it's ok to host endpoints	2025-10-15 19:42:54 +00:00
Andrej Karpathy	01fb290f53	allow multiple GPUs to do inference in a data parallel way	2025-10-15 19:12:19 +00:00
Andrej Karpathy	190d9515d0	dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports	2025-10-15 16:42:23 +00:00
Andrej Karpathy	b8076dd367	fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation	2025-10-15 16:35:04 +00:00
Andrej	67aaca98f5	export NANOCHAT_BASE_DIR so child processes get it too Export the cache directory so that users can use their own cache location	2025-10-14 16:01:28 -07:00
Zach Mueller	f0855cbcc7	Update speedrun.sh	2025-10-14 14:12:01 -04:00
google-labs-jules[bot]	f5349ffaea	fix: Re-add PYTORCH_CUDA_ALLOC_CONF to training scripts This commit re-adds the `PYTORCH_CUDA_ALLOC_CONF` environment variable to the training scripts. This setting helps prevent memory fragmentation and is beneficial for both CUDA and ROCm environments. This change was inadvertently removed during a previous refactoring.	2025-10-14 15:20:54 +00:00
Lawrence R Kincheloe III	31db19ae77	Merge pull request #1 from LokiMetaSmith/fix/pytorch-memory-fragmentation Set PYTORCH_CUDA_ALLOC_CONF to prevent memory fragmentation	2025-10-14 02:11:58 -05:00
Lawrence R Kincheloe III	b3f662a924	Merge branch 'rocm-support' into fix/pytorch-memory-fragmentation	2025-10-14 02:11:50 -05:00
google-labs-jules[bot]	b09f7fc29b	Set PYTORCH_CUDA_ALLOC_CONF to prevent memory fragmentation This change adds the `PYTORCH_CUDA_ALLOC_CONF` environment variable to the main `speedrun.sh` execution script. Setting `expandable_segments:True` is recommended by PyTorch to manage memory more efficiently and prevent fragmentation, addressing a `UserWarning` observed during execution.	2025-10-14 06:57:18 +00:00
google-labs-jules[bot]	5a785854d1	feat: Add HSA_OVERRIDE_GFX_VERSION for newer AMD GPUs This commit adds the `HSA_OVERRIDE_GFX_VERSION` environment variable to the `speedrun.sh` script. This is a workaround to enable support for newer AMD GPU architectures (e.g., gfx1151) that are not yet officially supported in the pre-compiled PyTorch ROCm builds. This change also includes an update to the `README.md` to explain this workaround to users.	2025-10-14 06:48:34 +00:00
google-labs-jules[bot]	19fa71d6e5	fix: Resolve HIP error and improve device detection This commit fixes a `torch.AcceleratorError: HIP error: invalid device function` that occurred during weight initialization on ROCm devices. It also improves the device detection logic to correctly identify and prioritize the ROCm backend. The key changes are: - Patched `nanochat/gpt.py` to initialize weights on the CPU before moving them to the target device, which avoids the HIP kernel error. - Simplified and corrected the device detection logic in `nanochat/common.py` to ensure the ROCm backend is properly selected when available.	2025-10-14 06:07:13 +00:00
google-labs-jules[bot]	054d903cae	fix: Address runtime errors and improve configuration This commit addresses several runtime errors encountered during the execution of the `speedrun.sh` script and improves the overall configuration of the project. The key changes are: - Patched `nanochat/configurator.py` to be more robust by handling flag-like arguments and ignoring unknown arguments. This resolves the `AssertionError`. - Fixed the argument handling for `chat_eval.py` in `speedrun.sh` to prevent argument parsing errors. - Updated `pyproject.toml` to correctly define optional dependencies for development.	2025-10-14 05:47:26 +00:00
google-labs-jules[bot]	f20d9d4d3c	docs: Update README with computing environment details	2025-10-14 05:18:12 +00:00
google-labs-jules[bot]	08c628cb83	feat: Add ROCm and device-agnostic support This change adds support for ROCm and makes the codebase device-agnostic, allowing it to run on different hardware backends including ROCm, CUDA, and CPU. The key changes are: - Modified `pyproject.toml` to use ROCm-compatible PyTorch wheels and added the `pytorch-triton-rocm` dependency. - Refactored `nanochat/common.py` to dynamically detect the available hardware and set the device and distributed backend accordingly. - Updated all training, evaluation, and inference scripts to be device-agnostic, removing hardcoded CUDA references. - Adapted `speedrun.sh` for single-device execution by replacing `torchrun` with `python`. - Updated `nanochat/report.py` to provide more generic GPU information.	2025-10-14 05:07:30 +00:00
Andrej	dd6ff9a1cc	fix bug in fallback case of find_largest_model Fix: Handle missing d<number> model tags in find_largest_model ty	2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig	afaa5b4c90	Fix: Handle missing d<number> model tags in find_largest_model	2025-10-14 00:24:07 +03:00
Andrej	5fd0b13886	Merge pull request #2 from epoyraz/patch-1 Update README.md	2025-10-13 10:10:15 -07:00
Enes Poyraz	6a795baf27	Update README.md fix typos	2025-10-13 18:40:12 +02:00
Andrej	626bd3e260	Add image of the WebUI to readme	2025-10-13 08:03:00 -07:00
karpathy	da96b46565	update link to the new discussion	2025-10-13 07:42:09 -07:00
karpathy	a53833d04f	add nanochat logo png	2025-10-13 06:59:59 -07:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

32 Commits