Tsvika Shapira
e6da3d7768
Revert "typing: add Path type hints to function signatures and returns"
...
This reverts commit 2dd3380dbc .
2025-12-27 13:19:10 +02:00
Tsvika Shapira
2dd3380dbc
typing: add Path type hints to function signatures and returns
2025-12-26 12:50:42 +02:00
Tsvika Shapira
6d6651e2df
refactor: refactor path operations
2025-12-26 12:49:38 +02:00
Tsvika Shapira
70eb760b1f
refactor: replace open-write-close patterns with pathlib methods
2025-12-26 12:49:38 +02:00
Tsvika Shapira
52661e5b5c
refactor: use Path convenience methods for file operations
...
Simplified file reading patterns by using Path.read_text() instead of
with path.open() as f: f.read(). This makes the code more concise and
Pythonic while maintaining the same functionality.
Changes:
- Replace path.open().read() with path.read_text()
- Replace yaml.safe_load(f) with yaml.safe_load(path.read_text())
- Eliminate redundant file reads in configurator.py (read file once)
- Reduce code by 10 lines overall
All changes preserve existing behavior and encoding specifications.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26 12:49:38 +02:00
Tsvika Shapira
b1925368f9
refactor: migrate from os.path to pathlib.Path across codebase
...
Converted all path operations to use pathlib.Path instead of os.path module.
This modernizes the codebase and fixes all 135 ruff PTH violations.
Changes:
- Replace os.path.join() with Path / operator
- Replace os.path.exists() with Path.exists()
- Replace os.makedirs() with Path.mkdir()
- Replace open() with Path.open() where appropriate
- Replace os.remove() with Path.unlink()
- Replace os.getcwd() with Path.cwd()
- Replace os.path.expanduser("~") with Path.home()
- Add type hints for Path parameters in function signatures
All path objects are now created at first occurrence and propagated
through the codebase, eliminating unnecessary string conversions.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26 12:49:38 +02:00
Tsvika Shapira
886b409e75
chore: remove unused import
2025-12-26 12:47:18 +02:00
Andrej
39cccc527f
small bugfix make mid_train script work even with a tiny number of iterations
2025-12-08 18:27:32 -08:00
Andrej
8b1cecaa95
Apply suggestion from @svlandeg for nicer looking comparison
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2025-12-08 18:27:06 -08:00
Andrej
58f3e84e01
clean up train/val loader in sft for consistency with mid/base
2025-12-08 18:23:57 -08:00
Sanzo00
53b3a4fb81
fix: missing val_bpb on resume
2025-11-22 11:04:20 +08:00
svlandeg
4bcc3bb698
clarify comment
2025-11-21 13:19:45 +01:00
Eric Silberstein
f37d45c21f
remove unneeded iter()
2025-11-20 15:14:56 -05:00
Eric Silberstein
dddb95caac
make mid_train script work even with a tiny number of iterations
2025-11-19 15:52:20 -05:00
Andrej
4763ce612a
Small fixes to typos
2025-11-14 07:25:59 -08:00
svlandeg
a2fb3c83a6
fix typos
2025-11-14 11:20:25 +01:00
Andrej Karpathy
c6abcdfe3a
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
2025-11-13 15:34:40 +00:00
Andrej Karpathy
c6b7ab7440
grad clip logging and printing and cosmetics
2025-11-05 21:08:30 +00:00
svlandeg
2ce62ec076
ensure consistency of quotes within each statement
2025-11-03 21:52:02 +01:00
svlandeg
c72b8b2309
add explicit UTF-8 encoding
2025-11-03 21:27:12 +01:00
Dipesh Babu
226953b841
fix: open JSONL and results CSV with UTF-8 encoding for portability
2025-11-03 01:20:56 -05:00
svlandeg
52e85aaf80
Merge branch 'master' into fix/typo
2025-11-02 13:41:13 +01:00
Andrej Karpathy
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
2025-11-01 16:04:38 +00:00
Andrej Karpathy
7d2c4a3d95
delete pandas dep in base_eval use csv instead
2025-11-01 15:28:30 +00:00
Andrej
dfc88334b6
fix tok/sec calculation bug when grad accum steps > 1
...
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-30 08:36:32 -07:00
svlandeg
70319851fc
fix typo
2025-10-29 19:48:34 +01:00
svlandeg
8c9b004c99
typo fixes in scripts
2025-10-28 20:17:31 +01:00
water-vapor
a9de4b1038
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-26 01:43:49 -05:00
Andrej Karpathy
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
2025-10-24 14:02:48 +00:00
Andrej Karpathy
81597cd616
move the lr schedule args up in base_train so they are tunable in configurator
2025-10-24 13:27:31 +00:00
Luke Stanley
defd1246aa
Fix Torch crash caused by pinning on CPU
2025-10-21 20:28:10 +00:00
Andrej Karpathy
a088b7a6ec
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
2025-10-21 18:07:33 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
2025-10-21 17:15:53 +00:00
Andrej Karpathy
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
2025-10-21 15:04:58 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Andrej Karpathy
c1d2ed1c13
use orig_model in sampling, silly of me to miss this
2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de
use orig_model in sampling, silly of me to miss this
2025-10-20 00:04:15 +00:00
karpathy
ae02650afe
update the midtraining script too
2025-10-16 16:33:17 -07:00
karpathy
df600b6ed5
many small tweaks. base, eval, core work now i think
2025-10-16 15:46:18 -07:00
karpathy
786119d593
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
2025-10-16 10:26:19 -07:00
karpathy
279b74312c
adjust comment/guidance on device type
2025-10-16 10:06:39 -07:00
karpathy
306bc380ab
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543
trying to add basic cpu support, will try mps too
2025-10-16 16:14:38 +00:00
Andrej Karpathy
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
2025-10-16 01:28:37 +00:00
Andrej Karpathy
4c3590c499
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d
add basic logging to chat_web, which i think might be fun
2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53
allow multiple GPUs to do inference in a data parallel way
2025-10-15 19:12:19 +00:00
Andrej Karpathy
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
2025-10-15 16:42:23 +00:00