Shizhe Diao
|
2a6276bfcb
|
restore speedrun.sh
|
2025-10-22 22:36:12 -07:00 |
|
Shizhe Diao
|
29b94f35ec
|
track speedrun.sh
|
2025-10-22 22:33:19 -07:00 |
|
Shizhe Diao
|
1d34a19b87
|
remove pretrain.sh and midtrain.sh
|
2025-10-22 22:30:21 -07:00 |
|
Shizhe Diao
|
dd8310c3d4
|
clean comments
|
2025-10-22 22:22:28 -07:00 |
|
Shizhe Diao
|
f384c16ba5
|
Update comments
|
2025-10-22 22:19:20 -07:00 |
|
Shizhe Diao
|
55fed15421
|
remove redundant configs in base_eval.py
|
2025-10-22 22:09:24 -07:00 |
|
Shizhe Diao
|
3525e6d5b7
|
remove redundancy
|
2025-10-22 22:05:37 -07:00 |
|
Shizhe Diao
|
de3ef20e20
|
rename files
|
2025-10-22 22:00:26 -07:00 |
|
Shizhe Diao
|
66a92fc293
|
rename
|
2025-10-22 21:57:31 -07:00 |
|
Shizhe Diao
|
4b62a8b00c
|
add nemotron data processing script
|
2025-10-22 21:57:06 -07:00 |
|
Shizhe Diao
|
fc534f5f41
|
add script for nemotron recipe
|
2025-10-22 21:57:06 -07:00 |
|
Shizhe Diao
|
cf3b8ca20e
|
fixed a bug in base_eval.py
|
2025-10-22 21:57:02 -07:00 |
|
Shizhe Diao
|
b939be0372
|
update script
|
2025-10-22 21:56:32 -07:00 |
|
Shizhe Diao
|
f3f069519d
|
improve tokenizer and report in midtrain and sft
|
2025-10-22 21:56:27 -07:00 |
|
Shizhe Diao
|
169022fec0
|
fixed a bug in base_eval
|
2025-10-22 21:55:08 -07:00 |
|
Shizhe Diao
|
78611b9983
|
upload midtrain_sft_submit.sh
|
2025-10-22 21:53:28 -07:00 |
|
Shizhe Diao
|
370de99bbf
|
get report right
|
2025-10-22 21:53:27 -07:00 |
|
Shizhe Diao
|
e872e798c4
|
improve script
|
2025-10-22 21:53:26 -07:00 |
|
Shizhe Diao
|
defcef6587
|
edit report generation
|
2025-10-22 21:53:25 -07:00 |
|
Shizhe Diao
|
fc23c1aa71
|
use the same tokenizer
|
2025-10-22 21:53:24 -07:00 |
|
Shizhe Diao
|
cee6a17d9e
|
support nemotron posttraining data
|
2025-10-22 21:53:23 -07:00 |
|
Shizhe Diao
|
7690b82d4b
|
support nemotron posttraining data in mid-train and sft
|
2025-10-22 21:53:16 -07:00 |
|
Shizhe Diao
|
646647c776
|
support custom tokenizer by adding tokenizer_name
|
2025-10-22 21:49:59 -07:00 |
|
Shizhe Diao
|
2085e6637a
|
support custom training data, train tokenizer
|
2025-10-22 21:44:33 -07:00 |
|
Shizhe Diao
|
15e7a22a41
|
support custom training data
|
2025-10-22 21:42:49 -07:00 |
|
Shizhe Diao
|
21d8b9994f
|
multinode slurm submit
|
2025-10-22 21:20:23 -07:00 |
|
Shizhe Diao
|
be1e6c3592
|
add exp_name as unique id
|
2025-10-22 21:19:17 -07:00 |
|
Shizhe Diao
|
0de778a75b
|
update wandb
|
2025-10-22 21:19:16 -07:00 |
|
Andrej Karpathy
|
5eeb2b6ef9
|
experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme
|
2025-10-22 16:55:54 +00:00 |
|
Andrej Karpathy
|
2dda5c4c8d
|
Merge branch 'ulanch-fix/ios-safari-input-overlap'
|
2025-10-22 16:26:35 +00:00 |
|
Andrej Karpathy
|
80b203ea59
|
also bump run1000.sh to new uv sync
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
917c858136
|
Updates lockfile with CPU package support without overwriting other architectures
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
db1d5b595d
|
Git ignore eval_bundle
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
dd9387b362
|
Fix GPU-less CPU use on Linux with specific Torch indexes
|
2025-10-22 16:25:36 +00:00 |
|
Luke Stanley
|
32571664b1
|
Fix Torch crash caused by pinning on CPU
|
2025-10-22 16:25:36 +00:00 |
|
Andrej Karpathy
|
51e70f0d3c
|
Merge branch 'lukestanley-fix-cpu-support-with-extras'
|
2025-10-22 16:11:15 +00:00 |
|
Andrej Karpathy
|
48387cd895
|
also bump run1000.sh to new uv sync
|
2025-10-22 16:08:31 +00:00 |
|
ulanch
|
796f84527f
|
fix(ui): prevent iOS Safari toolbar from covering input on initial load
|
2025-10-21 17:34:40 -07:00 |
|
Luke Stanley
|
7a52f9bfbb
|
Updates lockfile with CPU package support without overwriting other architectures
|
2025-10-21 23:14:34 +00:00 |
|
Luke Stanley
|
760af62e11
|
Git ignore eval_bundle
|
2025-10-21 23:14:34 +00:00 |
|
Luke Stanley
|
901b075605
|
Fix GPU-less CPU use on Linux with specific Torch indexes
|
2025-10-21 23:14:16 +00:00 |
|
Luke Stanley
|
defd1246aa
|
Fix Torch crash caused by pinning on CPU
|
2025-10-21 20:28:10 +00:00 |
|
Andrej
|
2e938530ce
|
delete spurious torch.empty allocation in adamw
fix: remove unnecessary tensor allocation in DistAdamW optimizer
|
2025-10-21 11:35:17 -07:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
Andrej Karpathy
|
94ee507054
|
quick fix base eval due to fewshot requirement
|
2025-10-21 17:56:08 +00:00 |
|
Andrej
|
33e8a27f91
|
Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something.
add cpu|mps support
|
2025-10-21 10:26:04 -07:00 |
|
Andrej Karpathy
|
50bea28ef9
|
also add readme mention of the cpu mps changes
|
2025-10-21 17:24:48 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
dfcb1c16f1
|
Merge branch 'master' into cpu-mps-dev
|
2025-10-21 17:15:53 +00:00 |
|
Andrej Karpathy
|
bb71c64579
|
fix silly issue in dataloader, this version is much faster and more portable to mps too
|
2025-10-21 17:12:50 +00:00 |
|