Commit Graph

41 Commits

Author SHA1 Message Date
Sermet Pekin
539f42bf89 check with 3.9...3.13 2025-10-20 21:35:34 +03:00
Sermet Pekin
c3d633665d check with 3.9...3.13 2025-10-20 21:33:03 +03:00
Sermet Pekin
428ccb9eb1 multi platform gf wf windows encoding problem 2025-10-20 21:03:03 +03:00
Sermet Pekin
e8b86df766 multi platform gf wf 2025-10-20 20:55:08 +03:00
Sermet Pekin
3debc92022 multi platform gf wf 2025-10-20 20:50:49 +03:00
Sermet Pekin
8d75e112b6 gh wf multi platform 2025-10-20 20:44:39 +03:00
Sermet Pekin
116b0e75fc mac for gh 2025-10-20 20:32:56 +03:00
Sermet Pekin
aac58b51dc for now 2025-10-20 19:39:00 +03:00
Sermet Pekin
a3f5986f19 feat: Add macOS compatibility fixes
- Change PyTorch dependency from CUDA to CPU version for macOS support
- Update Rust edition from 2024 to 2021 for stable Cargo compatibility
- Relax PyTorch version requirement from >=2.8.0 to >=2.0.0
- Update dependency lock file with compatible versions
2025-10-20 19:02:26 +03:00
Sermet Pekin
0768f67290
Add 'dev' branch to workflow triggers 2025-10-20 11:45:14 +03:00
Sermet Pekin
cdb5a455ee
fallback to cpu on compute_init function
fallback to cpu on compute_init function
2025-10-20 11:43:47 +03:00
Sermet Pekin
11e46b6439
Add transformers dependency to pyproject.toml
Add transformers dependency to pyproject.toml
2025-10-20 11:41:27 +03:00
Sermet Pekin
e238750824
Add GitHub Actions workflow for testing Python code
GH workflow that
- installs with uv
- tests with pytest
2025-10-20 11:40:26 +03:00
Andrej
0f007889dd
Add MIT License as a file to the project 2025-10-19 17:22:19 -07:00
Andrej
5a879f4947
export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 17:07:56 -07:00
Andrej Karpathy
c1d2ed1c13 use orig_model in sampling, silly of me to miss this 2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de use orig_model in sampling, silly of me to miss this 2025-10-20 00:04:15 +00:00
Andrej Karpathy
9467d83cf2 fix memory leak bug in rust tokenizer ty @mitsuhiko 2025-10-19 23:54:31 +00:00
Tancrède Lepoint
b1443dc98c export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 14:05:40 -04:00
Andrej Karpathy
d6d86cbf4c update readme with a link to the CPU|MPS branch 2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo 2025-10-16 19:32:44 +00:00
Andrej Karpathy
4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f allow user to click on their message to edit them. conversation after that point is wiped 2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92 add slash commands to webui 2025-10-16 01:09:53 +00:00
Andrej Karpathy
fae3aca951 add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now 2025-10-15 20:32:22 +00:00
Andrej Karpathy
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints 2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d add basic logging to chat_web, which i think might be fun 2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints 2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
Andrej
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
Zach Mueller
f0855cbcc7
Update speedrun.sh 2025-10-14 14:12:01 -04:00
Andrej
dd6ff9a1cc
fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig
afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
Andrej
5fd0b13886
Merge pull request #2 from epoyraz/patch-1
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz
6a795baf27
Update README.md
fix typos
2025-10-13 18:40:12 +02:00
Andrej
626bd3e260
Add image of the WebUI to readme 2025-10-13 08:03:00 -07:00
karpathy
da96b46565 update link to the new discussion 2025-10-13 07:42:09 -07:00
karpathy
a53833d04f add nanochat logo png 2025-10-13 06:59:59 -07:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00