Andrej
|
eb11bb0e2e
|
remove numpy as dep
Remove explicit numpy dependency
|
2025-10-30 08:28:14 -07:00 |
|
Andrej
|
1ccbaf4416
|
nit delete redundant catch/raise in execute
Remove redundant exception handling in chdir
|
2025-10-29 08:10:03 -07:00 |
|
Andrej
|
29ff38d94b
|
Merge pull request #35 from bhaskar0210s/master
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
|
2025-10-29 08:06:24 -07:00 |
|
Andrej
|
a1de1f46ad
|
Merge pull request #156 from tlepoint/fix/export-base-dir
Export the base dir variable in runcpu.sh
|
2025-10-28 15:19:08 -07:00 |
|
Andrej
|
ee00f523d0
|
fixing all the typos to make the pull requests stop
Batch of typo fixes
|
2025-10-28 13:36:07 -07:00 |
|
Ajeesh Sunil
|
5e0987a431
|
numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list
|
2025-10-28 20:05:38 +00:00 |
|
svlandeg
|
8c9b004c99
|
typo fixes in scripts
|
2025-10-28 20:17:31 +01:00 |
|
svlandeg
|
0a3ce7b0ff
|
typo fixes in readme
|
2025-10-28 20:11:00 +01:00 |
|
Andrej Karpathy
|
fdda5826e3
|
Merge branch 'haowei01-fix_kv_cache_due_to_resize'
|
2025-10-28 16:54:30 +00:00 |
|
Andrej Karpathy
|
baf0b3fdda
|
also add a test that failed before the fix and passes now with the fix for kv cache resize
|
2025-10-28 16:54:17 +00:00 |
|
Andrej Karpathy
|
f1db6b4712
|
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
|
2025-10-28 16:51:41 +00:00 |
|
Andrej Karpathy
|
9415931f85
|
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
|
2025-10-28 15:17:43 +00:00 |
|
Haowei Zhang
|
2b9c085559
|
update the kv_shape
|
2025-10-27 02:47:13 -07:00 |
|
Haowei Zhang
|
b062b422ac
|
Fix kv cache, given resize will destroys the logical structure
|
2025-10-27 02:23:08 -07:00 |
|
Andrej Karpathy
|
c75fe54aa7
|
readme tweak, link to new discussion and add file structure
|
2025-10-25 19:39:16 +00:00 |
|
Andrej Karpathy
|
05a051dbe9
|
fix tokenization bug, there should be no space before first letter. sigh
|
2025-10-24 15:06:06 +00:00 |
|
Andrej Karpathy
|
8892470f29
|
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
|
2025-10-24 14:02:48 +00:00 |
|
Andrej Karpathy
|
81597cd616
|
move the lr schedule args up in base_train so they are tunable in configurator
|
2025-10-24 13:27:31 +00:00 |
|
Andrej Karpathy
|
cc3636b01c
|
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
|
2025-10-24 13:27:05 +00:00 |
|
Tancrède Lepoint
|
d5cda11ab8
|
Export the base dir variable
|
2025-10-22 18:15:02 -04:00 |
|
Andrej Karpathy
|
5eeb2b6ef9
|
experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme
|
2025-10-22 16:55:54 +00:00 |
|
Andrej Karpathy
|
2dda5c4c8d
|
Merge branch 'ulanch-fix/ios-safari-input-overlap'
|
2025-10-22 16:26:35 +00:00 |
|
Andrej Karpathy
|
80b203ea59
|
also bump run1000.sh to new uv sync
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
917c858136
|
Updates lockfile with CPU package support without overwriting other architectures
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
db1d5b595d
|
Git ignore eval_bundle
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
dd9387b362
|
Fix GPU-less CPU use on Linux with specific Torch indexes
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
32571664b1
|
Fix Torch crash caused by pinning on CPU
|
2025-10-22 16:25:36 +00:00 |
|
Andrej Karpathy
|
51e70f0d3c
|
Merge branch 'lukestanley-fix-cpu-support-with-extras'
|
2025-10-22 16:11:15 +00:00 |
|
Andrej Karpathy
|
48387cd895
|
also bump run1000.sh to new uv sync
|
2025-10-22 16:08:31 +00:00 |
|
ulanch
|
796f84527f
|
fix(ui): prevent iOS Safari toolbar from covering input on initial load
|
2025-10-21 17:34:40 -07:00 |
|
Luke Stanley
|
7a52f9bfbb
|
Updates lockfile with CPU package support without overwriting other architectures
|
2025-10-21 23:14:34 +00:00 |
|
Luke Stanley
|
760af62e11
|
Git ignore eval_bundle
|
2025-10-21 23:14:34 +00:00 |
|
Luke Stanley
|
901b075605
|
Fix GPU-less CPU use on Linux with specific Torch indexes
|
2025-10-21 23:14:16 +00:00 |
|
Luke Stanley
|
defd1246aa
|
Fix Torch crash caused by pinning on CPU
|
2025-10-21 20:28:10 +00:00 |
|
Andrej
|
2e938530ce
|
delete spurious torch.empty allocation in adamw
fix: remove unnecessary tensor allocation in DistAdamW optimizer
|
2025-10-21 11:35:17 -07:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
Andrej Karpathy
|
94ee507054
|
quick fix base eval due to fewshot requirement
|
2025-10-21 17:56:08 +00:00 |
|
Andrej
|
33e8a27f91
|
Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something.
add cpu|mps support
|
2025-10-21 10:26:04 -07:00 |
|
Andrej Karpathy
|
50bea28ef9
|
also add readme mention of the cpu mps changes
|
2025-10-21 17:24:48 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
dfcb1c16f1
|
Merge branch 'master' into cpu-mps-dev
|
2025-10-21 17:15:53 +00:00 |
|
Andrej Karpathy
|
bb71c64579
|
fix silly issue in dataloader, this version is much faster and more portable to mps too
|
2025-10-21 17:12:50 +00:00 |
|
karpathy
|
bb786c5560
|
i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
|
2025-10-21 10:07:40 -07:00 |
|
Andrej
|
c9ea7a91e2
|
Add customization instructions to README
Added a section on customization for nanochat.
|
2025-10-21 08:57:10 -07:00 |
|
Andrej Karpathy
|
03cddd9878
|
actually let's not brick code on git pull. change error to warning
|
2025-10-21 15:13:25 +00:00 |
|
Andrej Karpathy
|
fe5aed940b
|
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
|
2025-10-21 15:04:58 +00:00 |
|
karpathy
|
2e9669e03a
|
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
2025-10-20 10:15:17 -07:00 |
|
Andrej
|
a09ac812ed
|
toml changes for cpu only install
|
2025-10-20 07:53:15 -07:00 |
|
Sermet Pekin
|
49cd02f283
|
fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
|
2025-10-20 12:03:26 +03:00 |
|
burtenshaw
|
0abb0fa2e3
|
add both sides of the source check
|
2025-10-20 10:44:07 +02:00 |
|