Jason Kneen
e83d633179
Add training continuation script and update MacOS guide
...
Introduces continue_training.sh to automatically resume interrupted training stages by detecting existing checkpoints and proceeding as needed. Updates README_MACOS.md with instructions and troubleshooting for using the new script, including manual continuation steps and improved guidance for memory, architecture, and performance issues.
2025-10-22 09:37:31 +01:00
Jason Kneen
b81d789992
Pass device batch size to base_loss script
...
Added the --device_batch_size argument to the base_loss evaluation command in runmac_overnight.sh to ensure batch size is configurable during evaluation.
2025-10-22 09:29:46 +01:00
Jason Kneen
1225ddf00e
Add macOS memory-optimized training and documentation
...
Introduces automatic memory detection and batch size optimization for Apple Silicon Macs in runcpu.sh and runmac_overnight.sh scripts. Adds a comprehensive README_MACOS.md with usage instructions, performance profiles, environment variable overrides, troubleshooting, and expected training times. Updates scripts to allow manual overrides and improve usability for various Mac configurations. Also switched python to arm64 for 2-3x improvement
2025-10-22 07:35:26 +01:00
Jason Kneen
5a3d8b6b5e
Update nanochat/gpt.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 02:37:32 +01:00
Jason Kneen
3e184d343e
Improve Mac/MPS compatibility and device handling
...
Added dev/runmac_overnight.sh for optimized Mac training. Updated device-specific logic throughout dataloader, GPT, Muon optimizer, and training scripts to avoid CUDA-only features on MPS/CPU (e.g., torch.compile, pin_memory, non_blocking, bfloat16). Relaxed torch version constraints in pyproject.toml and removed Linux/CUDA-specific PyTorch config for better macOS support.
2025-10-22 01:55:38 +01:00
Andrej Karpathy
50bea28ef9
also add readme mention of the cpu mps changes
2025-10-21 17:24:48 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
2025-10-21 17:15:53 +00:00
Andrej Karpathy
bb71c64579
fix silly issue in dataloader, this version is much faster and more portable to mps too
2025-10-21 17:12:50 +00:00
karpathy
bb786c5560
i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
2025-10-21 10:07:40 -07:00
Andrej
c9ea7a91e2
Add customization instructions to README
...
Added a section on customization for nanochat.
2025-10-21 08:57:10 -07:00
Andrej Karpathy
03cddd9878
actually let's not brick code on git pull. change error to warning
2025-10-21 15:13:25 +00:00
Andrej Karpathy
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Andrej
a09ac812ed
toml changes for cpu only install
2025-10-20 07:53:15 -07:00
burtenshaw
0abb0fa2e3
add both sides of the source check
2025-10-20 10:44:07 +02:00
burtenshaw
c7ae920a77
add check for linux on cpu
2025-10-20 06:51:52 +02:00
Andrej
0f007889dd
Add MIT License as a file to the project
2025-10-19 17:22:19 -07:00
Andrej
5a879f4947
export NANOCHAT_BASE_DIR so child processes get it too
2025-10-19 17:07:56 -07:00
Andrej Karpathy
c1d2ed1c13
use orig_model in sampling, silly of me to miss this
2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de
use orig_model in sampling, silly of me to miss this
2025-10-20 00:04:15 +00:00
Andrej Karpathy
9467d83cf2
fix memory leak bug in rust tokenizer ty @mitsuhiko
2025-10-19 23:54:31 +00:00
Tancrède Lepoint
b1443dc98c
export NANOCHAT_BASE_DIR so child processes get it too
2025-10-19 14:05:40 -04:00
Andrej
cf2baf9933
fix typo
...
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
karpathy
e4f9b9c64d
revert to previous pyproject.toml
2025-10-17 08:08:16 -07:00
Andrej
e883b1d597
Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
...
Add mps and cpu dependency management
2025-10-17 07:24:38 -07:00
burtenshaw
23b6351c1c
add groups and source selection
2025-10-17 12:20:18 +02:00
karpathy
ae02650afe
update the midtraining script too
2025-10-16 16:33:17 -07:00
karpathy
df600b6ed5
many small tweaks. base, eval, core work now i think
2025-10-16 15:46:18 -07:00
Andrej Karpathy
d6d86cbf4c
update readme with a link to the CPU|MPS branch
2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
2025-10-16 19:32:44 +00:00
karpathy
786119d593
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
2025-10-16 10:26:19 -07:00
karpathy
279b74312c
adjust comment/guidance on device type
2025-10-16 10:06:39 -07:00
karpathy
306bc380ab
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543
trying to add basic cpu support, will try mps too
2025-10-16 16:14:38 +00:00
Andrej Karpathy
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f
allow user to click on their message to edit them. conversation after that point is wiped
2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92
add slash commands to webui
2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d
add basic logging to chat_web, which i think might be fun
2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53
allow multiple GPUs to do inference in a data parallel way
2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation
2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
...
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh
2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
...
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90
Fix: Handle missing d<number> model tags in find_largest_model
2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
...
Update README.md
2025-10-13 10:10:15 -07:00