Shizhe Diao
|
f3f069519d
|
improve tokenizer and report in midtrain and sft
|
2025-10-22 21:56:27 -07:00 |
|
Shizhe Diao
|
7690b82d4b
|
support nemotron posttraining data in mid-train and sft
|
2025-10-22 21:53:16 -07:00 |
|
Luke Stanley
|
defd1246aa
|
Fix Torch crash caused by pinning on CPU
|
2025-10-21 20:28:10 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
fe5aed940b
|
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
|
2025-10-21 15:04:58 +00:00 |
|
karpathy
|
2e9669e03a
|
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
2025-10-20 10:15:17 -07:00 |
|
karpathy
|
ae02650afe
|
update the midtraining script too
|
2025-10-16 16:33:17 -07:00 |
|
Andrej Karpathy
|
b8076dd367
|
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
|
2025-10-15 16:35:04 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|