Andrej
d1558c7873
handle bf16 on MPS by casting to fp32 during load checkpoint
2025-11-04 09:42:50 -08:00
Andrej
df25293087
Add explicit UTF-8 encoding on open
2025-11-04 09:38:18 -08:00
Dipesh Babu
7a40ee77b4
fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues
2025-11-03 16:00:56 -05:00
svlandeg
2ce62ec076
ensure consistency of quotes within each statement
2025-11-03 21:52:02 +01:00
svlandeg
e22fc6f2fa
few more explicit UTF-8 encodings
2025-11-03 21:46:39 +01:00
svlandeg
c72b8b2309
add explicit UTF-8 encoding
2025-11-03 21:27:12 +01:00
Andrej
a83646e098
fix(eval): use UTF-8 when reading CORE JSONL and writing CSV
2025-11-03 06:38:33 -08:00
Andrej
8681922328
fix lstrip bug, make it removeprefix, TIL.
2025-11-03 06:37:48 -08:00
Dipesh Babu
226953b841
fix: open JSONL and results CSV with UTF-8 encoding for portability
2025-11-03 01:20:56 -05:00
Josh Odom
f1e15f5f4d
Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.
2025-11-02 23:40:37 -06:00
Andrej
b6da6982f6
fix nanochat logo: the t was placed too far to the right
2025-11-02 08:17:00 -08:00
Andrej
c2c4f77e22
oops small bugfix to run1000.sh missing kwarg
2025-11-02 08:14:41 -08:00
Andrej
d1ac0b2d07
when loading models on CPU, convert tensors from bfloat16 to float
2025-11-02 07:58:56 -08:00
svlandeg
5bfcd31b73
revert more formatting changes
2025-11-02 14:17:10 +01:00
svlandeg
036a3c5881
revert formatting changes to facilitate review
2025-11-02 14:16:43 +01:00
Jing Zhang
ba4f40bf58
Update run1000.sh to add missing --run=$WANDB_RUN
2025-11-01 21:27:00 -07:00
Manuel Saelices
d54c9cbf8c
CPU Support, as bfloat16 params breaks inference
2025-11-01 23:38:50 +01:00
Andrej Karpathy
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
2025-11-01 16:04:38 +00:00
Andrej Karpathy
7d2c4a3d95
delete pandas dep in base_eval use csv instead
2025-11-01 15:28:30 +00:00
Andrej
ad39db5a23
tiny fix to comment
...
Update engine.py with correct error message on assert
2025-11-01 07:43:57 -07:00
Andrej
630f54ae5a
use empty locals and globals in call to eval() in engine tool use
...
harden eval: prevent the calc tool from accessing globals and locals
2025-11-01 07:22:59 -07:00
Andrej Karpathy
f15732524a
make deepwiki link better
2025-11-01 14:13:29 +00:00
Andrej
dfc88334b6
fix tok/sec calculation bug when grad accum steps > 1
...
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-30 08:36:32 -07:00
Andrej
eb11bb0e2e
remove numpy as dep
...
Remove explicit numpy dependency
2025-10-30 08:28:14 -07:00
Andrej
1ccbaf4416
nit delete redundant catch/raise in execute
...
Remove redundant exception handling in chdir
2025-10-29 08:10:03 -07:00
Andrej
29ff38d94b
Merge pull request #35 from bhaskar0210s/master
...
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
2025-10-29 08:06:24 -07:00
svlandeg
b996131570
Merge branch 'master' into logo/kerning-update
2025-10-29 11:45:40 +01:00
svlandeg
3fa974f93c
few more reverts
2025-10-29 11:45:02 +01:00
svlandeg
cbd560a83d
revert formatting changes to minimize diff and merge conflicts
2025-10-29 11:42:56 +01:00
Andrej
a1de1f46ad
Merge pull request #156 from tlepoint/fix/export-base-dir
...
Export the base dir variable in runcpu.sh
2025-10-28 15:19:08 -07:00
Andrej
ee00f523d0
fixing all the typos to make the pull requests stop
...
Batch of typo fixes
2025-10-28 13:36:07 -07:00
Ajeesh Sunil
5e0987a431
numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list
2025-10-28 20:05:38 +00:00
svlandeg
8c9b004c99
typo fixes in scripts
2025-10-28 20:17:31 +01:00
svlandeg
0a3ce7b0ff
typo fixes in readme
2025-10-28 20:11:00 +01:00
Andrej Karpathy
fdda5826e3
Merge branch 'haowei01-fix_kv_cache_due_to_resize'
2025-10-28 16:54:30 +00:00
Andrej Karpathy
baf0b3fdda
also add a test that failed before the fix and passes now with the fix for kv cache resize
2025-10-28 16:54:17 +00:00
Andrej Karpathy
f1db6b4712
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
2025-10-28 16:51:41 +00:00
Andrej Karpathy
9415931f85
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
2025-10-28 15:17:43 +00:00
Haowei Zhang
2b9c085559
update the kv_shape
2025-10-27 02:47:13 -07:00
Haowei Zhang
b062b422ac
Fix kv cache, given resize will destroys the logical structure
2025-10-27 02:23:08 -07:00
water-vapor
a9de4b1038
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-26 01:43:49 -05:00
Andrej Karpathy
c75fe54aa7
readme tweak, link to new discussion and add file structure
2025-10-25 19:39:16 +00:00
Marius Wachtler
fca2b8cd07
harden eval: prevent the calc tool from accessing globals and locals
...
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like
```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.
I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy
05a051dbe9
fix tokenization bug, there should be no space before first letter. sigh
2025-10-24 15:06:06 +00:00
Andrej Karpathy
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
2025-10-24 14:02:48 +00:00
Andrej Karpathy
81597cd616
move the lr schedule args up in base_train so they are tunable in configurator
2025-10-24 13:27:31 +00:00
Andrej Karpathy
cc3636b01c
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
2025-10-24 13:27:05 +00:00
Tancrède Lepoint
d5cda11ab8
Export the base dir variable
2025-10-22 18:15:02 -04:00
Andrej Karpathy
5eeb2b6ef9
experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme
2025-10-22 16:55:54 +00:00
Andrej Karpathy
2dda5c4c8d
Merge branch 'ulanch-fix/ios-safari-input-overlap'
2025-10-22 16:26:35 +00:00