Commit Graph

12 Commits

Author SHA1 Message Date
karpathy
786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip 2025-10-16 10:26:19 -07:00
karpathy
279b74312c adjust comment/guidance on device type 2025-10-16 10:06:39 -07:00
karpathy
306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543 trying to add basic cpu support, will try mps too 2025-10-16 16:14:38 +00:00
Andrej Karpathy
4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints 2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d add basic logging to chat_web, which i think might be fun 2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints 2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
Andrej Karpathy
b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00