svlandeg
|
8c9b004c99
|
typo fixes in scripts
|
2025-10-28 20:17:31 +01:00 |
|
Andrej Karpathy
|
8892470f29
|
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
|
2025-10-24 14:02:48 +00:00 |
|
Andrej Karpathy
|
81597cd616
|
move the lr schedule args up in base_train so they are tunable in configurator
|
2025-10-24 13:27:31 +00:00 |
|
Luke Stanley
|
defd1246aa
|
Fix Torch crash caused by pinning on CPU
|
2025-10-21 20:28:10 +00:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
dfcb1c16f1
|
Merge branch 'master' into cpu-mps-dev
|
2025-10-21 17:15:53 +00:00 |
|
Andrej Karpathy
|
fe5aed940b
|
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
|
2025-10-21 15:04:58 +00:00 |
|
karpathy
|
2e9669e03a
|
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
2025-10-20 10:15:17 -07:00 |
|
Andrej Karpathy
|
c1d2ed1c13
|
use orig_model in sampling, silly of me to miss this
|
2025-10-20 00:05:09 +00:00 |
|
Andrej Karpathy
|
2bc521a6de
|
use orig_model in sampling, silly of me to miss this
|
2025-10-20 00:04:15 +00:00 |
|
karpathy
|
ae02650afe
|
update the midtraining script too
|
2025-10-16 16:33:17 -07:00 |
|
karpathy
|
df600b6ed5
|
many small tweaks. base, eval, core work now i think
|
2025-10-16 15:46:18 -07:00 |
|
karpathy
|
786119d593
|
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
|
2025-10-16 10:26:19 -07:00 |
|
karpathy
|
279b74312c
|
adjust comment/guidance on device type
|
2025-10-16 10:06:39 -07:00 |
|
karpathy
|
306bc380ab
|
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
|
2025-10-16 10:04:43 -07:00 |
|
Andrej Karpathy
|
722da4f543
|
trying to add basic cpu support, will try mps too
|
2025-10-16 16:14:38 +00:00 |
|
Andrej Karpathy
|
4346536ab2
|
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
|
2025-10-16 01:28:37 +00:00 |
|
Andrej Karpathy
|
4c3590c499
|
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
|
2025-10-15 20:29:54 +00:00 |
|
Andrej Karpathy
|
03fa673b7d
|
add basic logging to chat_web, which i think might be fun
|
2025-10-15 19:51:06 +00:00 |
|
Andrej Karpathy
|
52bfeea8bd
|
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
|
2025-10-15 19:42:54 +00:00 |
|
Andrej Karpathy
|
01fb290f53
|
allow multiple GPUs to do inference in a data parallel way
|
2025-10-15 19:12:19 +00:00 |
|
Andrej Karpathy
|
190d9515d0
|
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
|
2025-10-15 16:42:23 +00:00 |
|
Andrej Karpathy
|
b8076dd367
|
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
|
2025-10-15 16:35:04 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|