Commit Graph

20 Commits

Author SHA1 Message Date
Tsvika Shapira
52661e5b5c refactor: use Path convenience methods for file operations
Simplified file reading patterns by using Path.read_text() instead of
with path.open() as f: f.read(). This makes the code more concise and
Pythonic while maintaining the same functionality.

Changes:
- Replace path.open().read() with path.read_text()
- Replace yaml.safe_load(f) with yaml.safe_load(path.read_text())
- Eliminate redundant file reads in configurator.py (read file once)
- Reduce code by 10 lines overall

All changes preserve existing behavior and encoding specifications.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26 12:49:38 +02:00
Tsvika Shapira
b1925368f9 refactor: migrate from os.path to pathlib.Path across codebase
Converted all path operations to use pathlib.Path instead of os.path module.
This modernizes the codebase and fixes all 135 ruff PTH violations.

Changes:
- Replace os.path.join() with Path / operator
- Replace os.path.exists() with Path.exists()
- Replace os.makedirs() with Path.mkdir()
- Replace open() with Path.open() where appropriate
- Replace os.remove() with Path.unlink()
- Replace os.getcwd() with Path.cwd()
- Replace os.path.expanduser("~") with Path.home()
- Add type hints for Path parameters in function signatures

All path objects are now created at first occurrence and propagated
through the codebase, eliminating unnecessary string conversions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26 12:49:38 +02:00
Sanzo00
53b3a4fb81 fix: missing val_bpb on resume 2025-11-22 11:04:20 +08:00
Andrej Karpathy
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster. 2025-11-13 15:34:40 +00:00
Andrej Karpathy
c6b7ab7440 grad clip logging and printing and cosmetics 2025-11-05 21:08:30 +00:00
Andrej
dfc88334b6
fix tok/sec calculation bug when grad accum steps > 1
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-30 08:36:32 -07:00
svlandeg
8c9b004c99 typo fixes in scripts 2025-10-28 20:17:31 +01:00
water-vapor
a9de4b1038 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1 2025-10-26 01:43:49 -05:00
Andrej Karpathy
81597cd616 move the lr schedule args up in base_train so they are tunable in configurator 2025-10-24 13:27:31 +00:00
Andrej Karpathy
a088b7a6ec use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
Andrej Karpathy
5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1 Merge branch 'master' into cpu-mps-dev 2025-10-21 17:15:53 +00:00
Andrej Karpathy
c1d2ed1c13 use orig_model in sampling, silly of me to miss this 2025-10-20 00:05:09 +00:00
Andrej Karpathy
2bc521a6de use orig_model in sampling, silly of me to miss this 2025-10-20 00:04:15 +00:00
karpathy
df600b6ed5 many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
karpathy
786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip 2025-10-16 10:26:19 -07:00
karpathy
279b74312c adjust comment/guidance on device type 2025-10-16 10:06:39 -07:00
karpathy
306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543 trying to add basic cpu support, will try mps too 2025-10-16 16:14:38 +00:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00