Andrej Karpathy
|
4346536ab2
|
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
|
2025-10-16 01:28:37 +00:00 |
|
Andrej Karpathy
|
4c3590c499
|
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
|
2025-10-15 20:29:54 +00:00 |
|
Andrej Karpathy
|
03fa673b7d
|
add basic logging to chat_web, which i think might be fun
|
2025-10-15 19:51:06 +00:00 |
|
Andrej Karpathy
|
52bfeea8bd
|
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
|
2025-10-15 19:42:54 +00:00 |
|
Andrej Karpathy
|
01fb290f53
|
allow multiple GPUs to do inference in a data parallel way
|
2025-10-15 19:12:19 +00:00 |
|
Andrej Karpathy
|
190d9515d0
|
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
|
2025-10-15 16:42:23 +00:00 |
|
Andrej Karpathy
|
b8076dd367
|
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
|
2025-10-15 16:35:04 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|