nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-02-10 12:39:50 +00:00

Author	SHA1	Message	Date
Tsvika Shapira	e6da3d7768	Revert "typing: add Path type hints to function signatures and returns" This reverts commit `2dd3380dbc`.	2025-12-27 13:19:10 +02:00
Tsvika Shapira	2dd3380dbc	typing: add Path type hints to function signatures and returns	2025-12-26 12:50:42 +02:00
Tsvika Shapira	6d6651e2df	refactor: refactor path operations	2025-12-26 12:49:38 +02:00
Tsvika Shapira	70eb760b1f	refactor: replace open-write-close patterns with pathlib methods	2025-12-26 12:49:38 +02:00
Tsvika Shapira	52661e5b5c	refactor: use Path convenience methods for file operations Simplified file reading patterns by using Path.read_text() instead of with path.open() as f: f.read(). This makes the code more concise and Pythonic while maintaining the same functionality. Changes: - Replace path.open().read() with path.read_text() - Replace yaml.safe_load(f) with yaml.safe_load(path.read_text()) - Eliminate redundant file reads in configurator.py (read file once) - Reduce code by 10 lines overall All changes preserve existing behavior and encoding specifications. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-26 12:49:38 +02:00
Tsvika Shapira	b1925368f9	refactor: migrate from os.path to pathlib.Path across codebase Converted all path operations to use pathlib.Path instead of os.path module. This modernizes the codebase and fixes all 135 ruff PTH violations. Changes: - Replace os.path.join() with Path / operator - Replace os.path.exists() with Path.exists() - Replace os.makedirs() with Path.mkdir() - Replace open() with Path.open() where appropriate - Replace os.remove() with Path.unlink() - Replace os.getcwd() with Path.cwd() - Replace os.path.expanduser("~") with Path.home() - Add type hints for Path parameters in function signatures All path objects are now created at first occurrence and propagated through the codebase, eliminating unnecessary string conversions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-26 12:49:38 +02:00
Tsvika Shapira	886b409e75	chore: remove unused import	2025-12-26 12:47:18 +02:00
Andrej	39cccc527f	small bugfix make mid_train script work even with a tiny number of iterations	2025-12-08 18:27:32 -08:00
Andrej	8b1cecaa95	Apply suggestion from @svlandeg for nicer looking comparison Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2025-12-08 18:27:06 -08:00
Andrej	58f3e84e01	clean up train/val loader in sft for consistency with mid/base	2025-12-08 18:23:57 -08:00
Sanzo00	53b3a4fb81	fix: missing val_bpb on resume	2025-11-22 11:04:20 +08:00
svlandeg	4bcc3bb698	clarify comment	2025-11-21 13:19:45 +01:00
Eric Silberstein	f37d45c21f	remove unneeded iter()	2025-11-20 15:14:56 -05:00
Eric Silberstein	dddb95caac	make mid_train script work even with a tiny number of iterations	2025-11-19 15:52:20 -05:00
Andrej	4763ce612a	Small fixes to typos	2025-11-14 07:25:59 -08:00
svlandeg	a2fb3c83a6	fix typos	2025-11-14 11:20:25 +01:00
Andrej Karpathy	c6abcdfe3a	big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.	2025-11-13 15:34:40 +00:00
Andrej Karpathy	c6b7ab7440	grad clip logging and printing and cosmetics	2025-11-05 21:08:30 +00:00
svlandeg	2ce62ec076	ensure consistency of quotes within each statement	2025-11-03 21:52:02 +01:00
svlandeg	c72b8b2309	add explicit UTF-8 encoding	2025-11-03 21:27:12 +01:00
Dipesh Babu	226953b841	fix: open JSONL and results CSV with UTF-8 encoding for portability	2025-11-03 01:20:56 -05:00
svlandeg	52e85aaf80	Merge branch 'master' into fix/typo	2025-11-02 13:41:13 +01:00
Andrej Karpathy	cf587acb1a	move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts	2025-11-01 16:04:38 +00:00
Andrej Karpathy	7d2c4a3d95	delete pandas dep in base_eval use csv instead	2025-11-01 15:28:30 +00:00
Andrej	dfc88334b6	fix tok/sec calculation bug when grad accum steps > 1 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-30 08:36:32 -07:00
svlandeg	70319851fc	fix typo	2025-10-29 19:48:34 +01:00
svlandeg	8c9b004c99	typo fixes in scripts	2025-10-28 20:17:31 +01:00
water-vapor	a9de4b1038	Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-26 01:43:49 -05:00
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Andrej Karpathy	81597cd616	move the lr schedule args up in base_train so they are tunable in configurator	2025-10-24 13:27:31 +00:00
Luke Stanley	defd1246aa	Fix Torch crash caused by pinning on CPU	2025-10-21 20:28:10 +00:00
Andrej Karpathy	a088b7a6ec	use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available	2025-10-21 18:07:33 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00
Andrej Karpathy	dfcb1c16f1	Merge branch 'master' into cpu-mps-dev	2025-10-21 17:15:53 +00:00
Andrej Karpathy	fe5aed940b	add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok	2025-10-21 15:04:58 +00:00
karpathy	2e9669e03a	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
Andrej Karpathy	c1d2ed1c13	use orig_model in sampling, silly of me to miss this	2025-10-20 00:05:09 +00:00
Andrej Karpathy	2bc521a6de	use orig_model in sampling, silly of me to miss this	2025-10-20 00:04:15 +00:00
karpathy	ae02650afe	update the midtraining script too	2025-10-16 16:33:17 -07:00
karpathy	df600b6ed5	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00
karpathy	786119d593	add autodetect of device and related stuff. getting weird warnings/errors still, so wip	2025-10-16 10:26:19 -07:00
karpathy	279b74312c	adjust comment/guidance on device type	2025-10-16 10:06:39 -07:00
karpathy	306bc380ab	add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights	2025-10-16 10:04:43 -07:00
Andrej Karpathy	722da4f543	trying to add basic cpu support, will try mps too	2025-10-16 16:14:38 +00:00
Andrej Karpathy	4346536ab2	also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate	2025-10-16 01:28:37 +00:00
Andrej Karpathy	4c3590c499	fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints	2025-10-15 20:29:54 +00:00
Andrej Karpathy	03fa673b7d	add basic logging to chat_web, which i think might be fun	2025-10-15 19:51:06 +00:00
Andrej Karpathy	52bfeea8bd	add very basic abuse prevention limits to chat_web so it's ok to host endpoints	2025-10-15 19:42:54 +00:00
Andrej Karpathy	01fb290f53	allow multiple GPUs to do inference in a data parallel way	2025-10-15 19:12:19 +00:00
Andrej Karpathy	190d9515d0	dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports	2025-10-15 16:42:23 +00:00

1 2

52 Commits