dssjon
|
d366f7e07d
|
Refactor constants in training scripts and engine to improve configurability. Replace hardcoded values with constants for KV cache growth, rotary cache multiplier, and learning rate parameters. This enhances maintainability and allows for easier adjustments in future iterations.
|
2025-10-15 14:58:06 -07:00 |
|
Andrej Karpathy
|
4c3590c499
|
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
|
2025-10-15 20:29:54 +00:00 |
|
Andrej Karpathy
|
03fa673b7d
|
add basic logging to chat_web, which i think might be fun
|
2025-10-15 19:51:06 +00:00 |
|
Andrej Karpathy
|
52bfeea8bd
|
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
|
2025-10-15 19:42:54 +00:00 |
|
Andrej Karpathy
|
01fb290f53
|
allow multiple GPUs to do inference in a data parallel way
|
2025-10-15 19:12:19 +00:00 |
|
Andrej Karpathy
|
190d9515d0
|
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
|
2025-10-15 16:42:23 +00:00 |
|
Andrej Karpathy
|
b8076dd367
|
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
|
2025-10-15 16:35:04 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|