Jason Kneen
5a3d8b6b5e
Update nanochat/gpt.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 02:37:32 +01:00
Jason Kneen
3e184d343e
Improve Mac/MPS compatibility and device handling
...
Added dev/runmac_overnight.sh for optimized Mac training. Updated device-specific logic throughout dataloader, GPT, Muon optimizer, and training scripts to avoid CUDA-only features on MPS/CPU (e.g., torch.compile, pin_memory, non_blocking, bfloat16). Relaxed torch version constraints in pyproject.toml and removed Linux/CUDA-specific PyTorch config for better macOS support.
2025-10-22 01:55:38 +01:00
Andrej Karpathy
50bea28ef9
also add readme mention of the cpu mps changes
2025-10-21 17:24:48 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
2025-10-21 17:15:53 +00:00
Andrej Karpathy
bb71c64579
fix silly issue in dataloader, this version is much faster and more portable to mps too
2025-10-21 17:12:50 +00:00
karpathy
bb786c5560
i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
2025-10-21 10:07:40 -07:00
Andrej
c9ea7a91e2
Add customization instructions to README
...
Added a section on customization for nanochat.
2025-10-21 08:57:10 -07:00
Andrej Karpathy
03cddd9878
actually let's not brick code on git pull. change error to warning
2025-10-21 15:13:25 +00:00
Andrej Karpathy
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Andrej
a09ac812ed
toml changes for cpu only install
2025-10-20 07:53:15 -07:00
burtenshaw
0abb0fa2e3
add both sides of the source check
2025-10-20 10:44:07 +02:00
burtenshaw
c7ae920a77
add check for linux on cpu
2025-10-20 06:51:52 +02:00
Andrej
0f007889dd
Add MIT License as a file to the project
2025-10-19 17:22:19 -07:00
Andrej
5a879f4947
export NANOCHAT_BASE_DIR so child processes get it too
2025-10-19 17:07:56 -07:00
Andrej Karpathy
c1d2ed1c13
use orig_model in sampling, silly of me to miss this
2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de
use orig_model in sampling, silly of me to miss this
2025-10-20 00:04:15 +00:00
Andrej Karpathy
9467d83cf2
fix memory leak bug in rust tokenizer ty @mitsuhiko
2025-10-19 23:54:31 +00:00
Tancrède Lepoint
b1443dc98c
export NANOCHAT_BASE_DIR so child processes get it too
2025-10-19 14:05:40 -04:00
Andrej
cf2baf9933
fix typo
...
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
karpathy
e4f9b9c64d
revert to previous pyproject.toml
2025-10-17 08:08:16 -07:00
Andrej
e883b1d597
Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
...
Add mps and cpu dependency management
2025-10-17 07:24:38 -07:00
burtenshaw
23b6351c1c
add groups and source selection
2025-10-17 12:20:18 +02:00
karpathy
ae02650afe
update the midtraining script too
2025-10-16 16:33:17 -07:00
karpathy
df600b6ed5
many small tweaks. base, eval, core work now i think
2025-10-16 15:46:18 -07:00
Andrej Karpathy
d6d86cbf4c
update readme with a link to the CPU|MPS branch
2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
2025-10-16 19:32:44 +00:00
karpathy
786119d593
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
2025-10-16 10:26:19 -07:00
karpathy
279b74312c
adjust comment/guidance on device type
2025-10-16 10:06:39 -07:00
karpathy
306bc380ab
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543
trying to add basic cpu support, will try mps too
2025-10-16 16:14:38 +00:00
Andrej Karpathy
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f
allow user to click on their message to edit them. conversation after that point is wiped
2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92
add slash commands to webui
2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d
add basic logging to chat_web, which i think might be fun
2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53
allow multiple GPUs to do inference in a data parallel way
2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation
2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
...
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh
2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
...
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90
Fix: Handle missing d<number> model tags in find_largest_model
2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
...
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz
6a795baf27
Update README.md
...
fix typos
2025-10-13 18:40:12 +02:00
Andrej
626bd3e260
Add image of the WebUI to readme
2025-10-13 08:03:00 -07:00
karpathy
da96b46565
update link to the new discussion
2025-10-13 07:42:09 -07:00