Commit Graph

54 Commits

Author SHA1 Message Date
Jason Kneen
b81d789992 Pass device batch size to base_loss script
Added the --device_batch_size argument to the base_loss evaluation command in runmac_overnight.sh to ensure batch size is configurable during evaluation.
2025-10-22 09:29:46 +01:00
Jason Kneen
1225ddf00e Add macOS memory-optimized training and documentation
Introduces automatic memory detection and batch size optimization for Apple Silicon Macs in runcpu.sh and runmac_overnight.sh scripts. Adds a comprehensive README_MACOS.md with usage instructions, performance profiles, environment variable overrides, troubleshooting, and expected training times. Updates scripts to allow manual overrides and improve usability for various Mac configurations. Also switched python to arm64 for 2-3x improvement
2025-10-22 07:35:26 +01:00
Jason Kneen
5a3d8b6b5e
Update nanochat/gpt.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 02:37:32 +01:00
Jason Kneen
3e184d343e Improve Mac/MPS compatibility and device handling
Added dev/runmac_overnight.sh for optimized Mac training. Updated device-specific logic throughout dataloader, GPT, Muon optimizer, and training scripts to avoid CUDA-only features on MPS/CPU (e.g., torch.compile, pin_memory, non_blocking, bfloat16). Relaxed torch version constraints in pyproject.toml and removed Linux/CUDA-specific PyTorch config for better macOS support.
2025-10-22 01:55:38 +01:00
Andrej Karpathy
50bea28ef9 also add readme mention of the cpu mps changes 2025-10-21 17:24:48 +00:00
Andrej Karpathy
5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1 Merge branch 'master' into cpu-mps-dev 2025-10-21 17:15:53 +00:00
Andrej Karpathy
bb71c64579 fix silly issue in dataloader, this version is much faster and more portable to mps too 2025-10-21 17:12:50 +00:00
karpathy
bb786c5560 i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history... 2025-10-21 10:07:40 -07:00
Andrej
c9ea7a91e2
Add customization instructions to README
Added a section on customization for nanochat.
2025-10-21 08:57:10 -07:00
Andrej Karpathy
03cddd9878 actually let's not brick code on git pull. change error to warning 2025-10-21 15:13:25 +00:00
Andrej Karpathy
fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok 2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
Andrej
a09ac812ed
toml changes for cpu only install 2025-10-20 07:53:15 -07:00
burtenshaw
0abb0fa2e3 add both sides of the source check 2025-10-20 10:44:07 +02:00
burtenshaw
c7ae920a77 add check for linux on cpu 2025-10-20 06:51:52 +02:00
Andrej
0f007889dd
Add MIT License as a file to the project 2025-10-19 17:22:19 -07:00
Andrej
5a879f4947
export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 17:07:56 -07:00
Andrej Karpathy
c1d2ed1c13 use orig_model in sampling, silly of me to miss this 2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de use orig_model in sampling, silly of me to miss this 2025-10-20 00:04:15 +00:00
Andrej Karpathy
9467d83cf2 fix memory leak bug in rust tokenizer ty @mitsuhiko 2025-10-19 23:54:31 +00:00
Tancrède Lepoint
b1443dc98c export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 14:05:40 -04:00
Andrej
cf2baf9933
fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
karpathy
e4f9b9c64d revert to previous pyproject.toml 2025-10-17 08:08:16 -07:00
Andrej
e883b1d597
Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
Add mps and cpu dependency management
2025-10-17 07:24:38 -07:00
burtenshaw
23b6351c1c add groups and source selection 2025-10-17 12:20:18 +02:00
karpathy
ae02650afe update the midtraining script too 2025-10-16 16:33:17 -07:00
karpathy
df600b6ed5 many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
Andrej Karpathy
d6d86cbf4c update readme with a link to the CPU|MPS branch 2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo 2025-10-16 19:32:44 +00:00
karpathy
786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip 2025-10-16 10:26:19 -07:00
karpathy
279b74312c adjust comment/guidance on device type 2025-10-16 10:06:39 -07:00
karpathy
306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543 trying to add basic cpu support, will try mps too 2025-10-16 16:14:38 +00:00
Andrej Karpathy
4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f allow user to click on their message to edit them. conversation after that point is wiped 2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92 add slash commands to webui 2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951 add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now 2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints 2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d add basic logging to chat_web, which i think might be fun 2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints 2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh 2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz
6a795baf27
Update README.md
fix typos
2025-10-13 18:40:12 +02:00