Andrej Karpathy
|
1076f97059
|
delete autocast, an unnecessary thorn in my side, manage dtypes directly
|
2026-03-04 23:55:30 +00:00 |
|
Sofie Van Landeghem
|
72b9064f9d
|
remove leftover mid references (#491)
|
2026-02-02 08:33:46 -08:00 |
|
Andrej Karpathy
|
1ddaad1c1c
|
nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1
|
2026-01-31 19:12:25 +00:00 |
|
Sofie Van Landeghem
|
d4ea28d4e2
|
Fix args in readme (#438)
* fix commands in readme, using new arg format
* fix typo
* add required -i flag to chat_eval example runs
|
2026-01-15 16:26:38 -08:00 |
|
svlandeg
|
a2fb3c83a6
|
fix typos
|
2025-11-14 11:20:25 +01:00 |
|
svlandeg
|
8c9b004c99
|
typo fixes in scripts
|
2025-10-28 20:17:31 +01:00 |
|
Andrej Karpathy
|
8892470f29
|
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
|
2025-10-24 14:02:48 +00:00 |
|
karpathy
|
2e9669e03a
|
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
2025-10-20 10:15:17 -07:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|