Sofie Van Landeghem
|
c6f5bd67db
|
revert change of base to sft for quick inline test
|
2025-11-14 12:20:03 +01:00 |
|
howardgao@outlook.com
|
b399e43168
|
fix engine test bug
|
2025-11-06 08:56:45 +08:00 |
|
Andrej Karpathy
|
c6b7ab7440
|
grad clip logging and printing and cosmetics
|
2025-11-05 21:08:30 +00:00 |
|
Andrej
|
885a4f25e7
|
Replace fcntl with filelock for Windows compatibility
|
2025-11-04 16:35:39 -08:00 |
|
Andrej
|
3a2ae631c4
|
Merge branch 'master' into master
|
2025-11-04 16:35:02 -08:00 |
|
Andrej
|
12d995f58c
|
Add NPROC_PER_NODE var to speedrun.sh and run1000.sh
|
2025-11-04 16:26:33 -08:00 |
|
svlandeg
|
f1683c5b16
|
set nproc_per_node as var in speedrun and run1000 scripts
|
2025-11-04 21:36:10 +01:00 |
|
Andrej
|
d1558c7873
|
handle bf16 on MPS by casting to fp32 during load checkpoint
|
2025-11-04 09:42:50 -08:00 |
|
Andrej
|
df25293087
|
Add explicit UTF-8 encoding on open
|
2025-11-04 09:38:18 -08:00 |
|
Yasser Makram
|
1e89af9862
|
Replace fcntl with filelock for Windows compatibility
|
2025-11-04 07:22:34 +00:00 |
|
Dipesh Babu
|
7a40ee77b4
|
fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues
|
2025-11-03 16:00:56 -05:00 |
|
svlandeg
|
2ce62ec076
|
ensure consistency of quotes within each statement
|
2025-11-03 21:52:02 +01:00 |
|
svlandeg
|
e22fc6f2fa
|
few more explicit UTF-8 encodings
|
2025-11-03 21:46:39 +01:00 |
|
svlandeg
|
c72b8b2309
|
add explicit UTF-8 encoding
|
2025-11-03 21:27:12 +01:00 |
|
Andrej
|
a83646e098
|
fix(eval): use UTF-8 when reading CORE JSONL and writing CSV
|
2025-11-03 06:38:33 -08:00 |
|
Andrej
|
8681922328
|
fix lstrip bug, make it removeprefix, TIL.
|
2025-11-03 06:37:48 -08:00 |
|
Dipesh Babu
|
226953b841
|
fix: open JSONL and results CSV with UTF-8 encoding for portability
|
2025-11-03 01:20:56 -05:00 |
|
Josh Odom
|
f1e15f5f4d
|
Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.
|
2025-11-02 23:40:37 -06:00 |
|
Andrej
|
b6da6982f6
|
fix nanochat logo: the t was placed too far to the right
|
2025-11-02 08:17:00 -08:00 |
|
Andrej
|
c2c4f77e22
|
oops small bugfix to run1000.sh missing kwarg
|
2025-11-02 08:14:41 -08:00 |
|
Andrej
|
d1ac0b2d07
|
when loading models on CPU, convert tensors from bfloat16 to float
|
2025-11-02 07:58:56 -08:00 |
|
svlandeg
|
5bfcd31b73
|
revert more formatting changes
|
2025-11-02 14:17:10 +01:00 |
|
svlandeg
|
036a3c5881
|
revert formatting changes to facilitate review
|
2025-11-02 14:16:43 +01:00 |
|
Jing Zhang
|
ba4f40bf58
|
Update run1000.sh to add missing --run=$WANDB_RUN
|
2025-11-01 21:27:00 -07:00 |
|
Manuel Saelices
|
d54c9cbf8c
|
CPU Support, as bfloat16 params breaks inference
|
2025-11-01 23:38:50 +01:00 |
|
Andrej Karpathy
|
cf587acb1a
|
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
|
2025-11-01 16:04:38 +00:00 |
|
Andrej Karpathy
|
7d2c4a3d95
|
delete pandas dep in base_eval use csv instead
|
2025-11-01 15:28:30 +00:00 |
|
Andrej
|
ad39db5a23
|
tiny fix to comment
Update engine.py with correct error message on assert
|
2025-11-01 07:43:57 -07:00 |
|
Andrej
|
630f54ae5a
|
use empty locals and globals in call to eval() in engine tool use
harden eval: prevent the calc tool from accessing globals and locals
|
2025-11-01 07:22:59 -07:00 |
|
Andrej Karpathy
|
f15732524a
|
make deepwiki link better
|
2025-11-01 14:13:29 +00:00 |
|
Andrej
|
dfc88334b6
|
fix tok/sec calculation bug when grad accum steps > 1
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
|
2025-10-30 08:36:32 -07:00 |
|
Andrej
|
eb11bb0e2e
|
remove numpy as dep
Remove explicit numpy dependency
|
2025-10-30 08:28:14 -07:00 |
|
Andrej
|
1ccbaf4416
|
nit delete redundant catch/raise in execute
Remove redundant exception handling in chdir
|
2025-10-29 08:10:03 -07:00 |
|
Andrej
|
29ff38d94b
|
Merge pull request #35 from bhaskar0210s/master
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
|
2025-10-29 08:06:24 -07:00 |
|
svlandeg
|
b996131570
|
Merge branch 'master' into logo/kerning-update
|
2025-10-29 11:45:40 +01:00 |
|
svlandeg
|
3fa974f93c
|
few more reverts
|
2025-10-29 11:45:02 +01:00 |
|
svlandeg
|
cbd560a83d
|
revert formatting changes to minimize diff and merge conflicts
|
2025-10-29 11:42:56 +01:00 |
|
Andrej
|
a1de1f46ad
|
Merge pull request #156 from tlepoint/fix/export-base-dir
Export the base dir variable in runcpu.sh
|
2025-10-28 15:19:08 -07:00 |
|
Andrej
|
ee00f523d0
|
fixing all the typos to make the pull requests stop
Batch of typo fixes
|
2025-10-28 13:36:07 -07:00 |
|
Ajeesh Sunil
|
5e0987a431
|
numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list
|
2025-10-28 20:05:38 +00:00 |
|
svlandeg
|
8c9b004c99
|
typo fixes in scripts
|
2025-10-28 20:17:31 +01:00 |
|
svlandeg
|
0a3ce7b0ff
|
typo fixes in readme
|
2025-10-28 20:11:00 +01:00 |
|
Andrej Karpathy
|
fdda5826e3
|
Merge branch 'haowei01-fix_kv_cache_due_to_resize'
|
2025-10-28 16:54:30 +00:00 |
|
Andrej Karpathy
|
baf0b3fdda
|
also add a test that failed before the fix and passes now with the fix for kv cache resize
|
2025-10-28 16:54:17 +00:00 |
|
Andrej Karpathy
|
f1db6b4712
|
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
|
2025-10-28 16:51:41 +00:00 |
|
Andrej Karpathy
|
9415931f85
|
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
|
2025-10-28 15:17:43 +00:00 |
|
Haowei Zhang
|
2b9c085559
|
update the kv_shape
|
2025-10-27 02:47:13 -07:00 |
|
Haowei Zhang
|
b062b422ac
|
Fix kv cache, given resize will destroys the logical structure
|
2025-10-27 02:23:08 -07:00 |
|
water-vapor
|
a9de4b1038
|
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
|
2025-10-26 01:43:49 -05:00 |
|
Andrej Karpathy
|
c75fe54aa7
|
readme tweak, link to new discussion and add file structure
|
2025-10-25 19:39:16 +00:00 |
|