Marius Wachtler
fca2b8cd07
harden eval: prevent the calc tool from accessing globals and locals
...
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like
```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.
I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy
05a051dbe9
fix tokenization bug, there should be no space before first letter. sigh
2025-10-24 15:06:06 +00:00
Andrej Karpathy
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
2025-10-24 14:02:48 +00:00
Andrej Karpathy
81597cd616
move the lr schedule args up in base_train so they are tunable in configurator
2025-10-24 13:27:31 +00:00
Andrej Karpathy
cc3636b01c
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
2025-10-24 13:27:05 +00:00
Andrej Karpathy
5eeb2b6ef9
experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme
2025-10-22 16:55:54 +00:00
Andrej Karpathy
2dda5c4c8d
Merge branch 'ulanch-fix/ios-safari-input-overlap'
2025-10-22 16:26:35 +00:00
Andrej Karpathy
80b203ea59
also bump run1000.sh to new uv sync
2025-10-22 16:25:36 +00:00
Luke Stanley
917c858136
Updates lockfile with CPU package support without overwriting other architectures
2025-10-22 16:25:36 +00:00
Luke Stanley
db1d5b595d
Git ignore eval_bundle
2025-10-22 16:25:36 +00:00
Luke Stanley
dd9387b362
Fix GPU-less CPU use on Linux with specific Torch indexes
2025-10-22 16:25:36 +00:00
Luke Stanley
32571664b1
Fix Torch crash caused by pinning on CPU
2025-10-22 16:25:36 +00:00
Andrej Karpathy
51e70f0d3c
Merge branch 'lukestanley-fix-cpu-support-with-extras'
2025-10-22 16:11:15 +00:00
Andrej Karpathy
48387cd895
also bump run1000.sh to new uv sync
2025-10-22 16:08:31 +00:00
ulanch
796f84527f
fix(ui): prevent iOS Safari toolbar from covering input on initial load
2025-10-21 17:34:40 -07:00
Luke Stanley
7a52f9bfbb
Updates lockfile with CPU package support without overwriting other architectures
2025-10-21 23:14:34 +00:00
Luke Stanley
760af62e11
Git ignore eval_bundle
2025-10-21 23:14:34 +00:00
Luke Stanley
901b075605
Fix GPU-less CPU use on Linux with specific Torch indexes
2025-10-21 23:14:16 +00:00
Luke Stanley
defd1246aa
Fix Torch crash caused by pinning on CPU
2025-10-21 20:28:10 +00:00
Andrej
2e938530ce
delete spurious torch.empty allocation in adamw
...
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-21 11:35:17 -07:00
Andrej Karpathy
a088b7a6ec
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
2025-10-21 18:07:33 +00:00
Andrej Karpathy
94ee507054
quick fix base eval due to fewshot requirement
2025-10-21 17:56:08 +00:00
Andrej
33e8a27f91
Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something.
...
add cpu|mps support
2025-10-21 10:26:04 -07:00
Andrej Karpathy
50bea28ef9
also add readme mention of the cpu mps changes
2025-10-21 17:24:48 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
2025-10-21 17:15:53 +00:00
Andrej Karpathy
bb71c64579
fix silly issue in dataloader, this version is much faster and more portable to mps too
2025-10-21 17:12:50 +00:00
karpathy
bb786c5560
i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
2025-10-21 10:07:40 -07:00
Andrej
c9ea7a91e2
Add customization instructions to README
...
Added a section on customization for nanochat.
2025-10-21 08:57:10 -07:00
Andrej Karpathy
03cddd9878
actually let's not brick code on git pull. change error to warning
2025-10-21 15:13:25 +00:00
Andrej Karpathy
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Andrej
a09ac812ed
toml changes for cpu only install
2025-10-20 07:53:15 -07:00
Sermet Pekin
49cd02f283
fix: remove unnecessary tensor allocation in DistAdamW optimizer
...
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
burtenshaw
0abb0fa2e3
add both sides of the source check
2025-10-20 10:44:07 +02:00
burtenshaw
c7ae920a77
add check for linux on cpu
2025-10-20 06:51:52 +02:00
Andrej
0f007889dd
Add MIT License as a file to the project
2025-10-19 17:22:19 -07:00
Andrej
5a879f4947
export NANOCHAT_BASE_DIR so child processes get it too
2025-10-19 17:07:56 -07:00
Andrej Karpathy
c1d2ed1c13
use orig_model in sampling, silly of me to miss this
2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de
use orig_model in sampling, silly of me to miss this
2025-10-20 00:04:15 +00:00
Andrej Karpathy
9467d83cf2
fix memory leak bug in rust tokenizer ty @mitsuhiko
2025-10-19 23:54:31 +00:00
Tancrède Lepoint
b1443dc98c
export NANOCHAT_BASE_DIR so child processes get it too
2025-10-19 14:05:40 -04:00
Andrej
cf2baf9933
fix typo
...
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
karpathy
e4f9b9c64d
revert to previous pyproject.toml
2025-10-17 08:08:16 -07:00
Andrej
e883b1d597
Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
...
Add mps and cpu dependency management
2025-10-17 07:24:38 -07:00
burtenshaw
23b6351c1c
add groups and source selection
2025-10-17 12:20:18 +02:00
karpathy
ae02650afe
update the midtraining script too
2025-10-16 16:33:17 -07:00
karpathy
df600b6ed5
many small tweaks. base, eval, core work now i think
2025-10-16 15:46:18 -07:00
Andrej Karpathy
d6d86cbf4c
update readme with a link to the CPU|MPS branch
2025-10-16 22:03:39 +00:00
Andrej Karpathy
ccfe7915ac
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
2025-10-16 19:32:44 +00:00