nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-04-14 12:58:34 +00:00

Author	SHA1	Message	Date
javasoup	2adcc95c4e	Merge branch 'master' into refactor-vertex-ai-pipelines	2025-12-01 20:07:43 -05:00
Nuno Pereira	13001597c2	Success on Vertex Pipelines	2025-12-01 19:59:58 -05:00
Andrej	4a87a0d19f	Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup Fix comment: rotary embeddings final dimension size	2025-11-17 13:29:21 -08:00
Sam Abrahams	11e68bf442	Fix comment: rotary embeddings final dimension size	2025-11-17 11:32:56 -05:00
Andrej Karpathy	bc1fca39f3	mqa -> gqa to reduce confusion	2025-11-15 15:43:37 +00:00
Andrej	f66a780f68	Fix torch.dtype mismatching when running engine inline test.	2025-11-14 07:28:29 -08:00
Andrej	4763ce612a	Small fixes to typos	2025-11-14 07:25:59 -08:00
Sofie Van Landeghem	c6f5bd67db	revert change of base to sft for quick inline test	2025-11-14 12:20:03 +01:00
svlandeg	a2fb3c83a6	fix typos	2025-11-14 11:20:25 +01:00
svlandeg	e5efb4b471	add test_engine.py to file structure	2025-11-14 11:13:42 +01:00
Andrej Karpathy	9a71d13688	typo oops	2025-11-13 16:08:30 +00:00
Andrej Karpathy	7b7fd0fe71	thank you Sophie for your help with nanochat	2025-11-13 16:07:54 +00:00
Andrej Karpathy	c6abcdfe3a	big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.	2025-11-13 15:34:40 +00:00
Andrej Karpathy	91f09ccd0d	minor fix comment in engine	2025-11-13 15:28:18 +00:00
Andrej Karpathy	adb5d4a16c	uv lock has to change when we removed numpy the other commit	2025-11-13 15:16:27 +00:00
howardgao@outlook.com	b399e43168	fix engine test bug	2025-11-06 08:56:45 +08:00
Andrej Karpathy	c6b7ab7440	grad clip logging and printing and cosmetics	2025-11-05 21:08:30 +00:00
Andrej	885a4f25e7	Replace fcntl with filelock for Windows compatibility	2025-11-04 16:35:39 -08:00
Andrej	3a2ae631c4	Merge branch 'master' into master	2025-11-04 16:35:02 -08:00
Andrej	12d995f58c	Add `NPROC_PER_NODE` var to `speedrun.sh` and `run1000.sh`	2025-11-04 16:26:33 -08:00
svlandeg	f1683c5b16	set nproc_per_node as var in speedrun and run1000 scripts	2025-11-04 21:36:10 +01:00
Andrej	d1558c7873	handle bf16 on MPS by casting to fp32 during load checkpoint	2025-11-04 09:42:50 -08:00
Andrej	df25293087	Add explicit UTF-8 encoding on open	2025-11-04 09:38:18 -08:00
Yasser Makram	1e89af9862	Replace fcntl with filelock for Windows compatibility	2025-11-04 07:22:34 +00:00
google-labs-jules[bot]	a88e7ec21f	fix: Correct Docker build for rustbpe tokenizer This commit fixes a build failure in the Docker image by implementing a more robust build process for the `rustbpe` tokenizer. The `Dockerfile` now explicitly creates a `uv` virtual environment, adds its `bin` directory to the `PATH`, installs `maturin` into the environment, and then runs the `maturin develop` command. This ensures that the build command executes within a fully configured environment with all necessary tools available on the `PATH`, resolving the "No such file or directory" error.	2025-11-04 02:24:08 +00:00
google-labs-jules[bot]	fa04262889	fix: Correct Docker build for rustbpe tokenizer This commit fixes a build failure in the Docker image by adding the `--uv` flag to the `maturin develop` command. The `maturin` build process was failing because it could not find `pip` within the `uv` environment. The `--uv` flag ensures that `maturin` correctly uses the `uv` environment to build the `rustbpe` tokenizer.	2025-11-04 02:05:34 +00:00
google-labs-jules[bot]	a2189d20d0	feat: Use Cloud Build for Vertex AI pipeline image creation This commit streamlines the process of running the nanochat pipeline on Vertex AI by using Cloud Build to automate the Docker image creation process. A `cloudbuild.yaml` file has been added to define the build steps, and a `run_pipeline.sh` script has been created to orchestrate the build and pipeline submission. The `README.md` has been updated to reflect the new, simplified workflow.	2025-11-04 01:47:20 +00:00
google-labs-jules[bot]	2781d216c6	feat: Refactor nanochat to run on Vertex AI Pipelines This refactoring enables the nanochat project to be executed as a scalable and robust pipeline on Vertex AI. The monolithic `speedrun.sh` script has been decomposed into a series of containerized components orchestrated by a Kubeflow pipeline. The codebase has been updated to use Google Cloud Storage for artifact management, allowing for seamless data sharing between pipeline steps. A `Dockerfile` and Python wrappers for each pipeline step have been added to the `vertex_pipelines` directory. The `README.md` has been updated with instructions on how to build the Docker image and run the Vertex AI pipeline.	2025-11-04 01:26:51 +00:00
Dipesh Babu	7a40ee77b4	fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues	2025-11-03 16:00:56 -05:00
svlandeg	2ce62ec076	ensure consistency of quotes within each statement	2025-11-03 21:52:02 +01:00
svlandeg	e22fc6f2fa	few more explicit UTF-8 encodings	2025-11-03 21:46:39 +01:00
svlandeg	c72b8b2309	add explicit UTF-8 encoding	2025-11-03 21:27:12 +01:00
Andrej	a83646e098	fix(eval): use UTF-8 when reading CORE JSONL and writing CSV	2025-11-03 06:38:33 -08:00
Andrej	8681922328	fix lstrip bug, make it removeprefix, TIL.	2025-11-03 06:37:48 -08:00
Dipesh Babu	226953b841	fix: open JSONL and results CSV with UTF-8 encoding for portability	2025-11-03 01:20:56 -05:00
Josh Odom	f1e15f5f4d	Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.	2025-11-02 23:40:37 -06:00
Andrej	b6da6982f6	fix nanochat logo: the t was placed too far to the right	2025-11-02 08:17:00 -08:00
Andrej	c2c4f77e22	oops small bugfix to run1000.sh missing kwarg	2025-11-02 08:14:41 -08:00
Andrej	d1ac0b2d07	when loading models on CPU, convert tensors from bfloat16 to float	2025-11-02 07:58:56 -08:00
svlandeg	5bfcd31b73	revert more formatting changes	2025-11-02 14:17:10 +01:00
svlandeg	036a3c5881	revert formatting changes to facilitate review	2025-11-02 14:16:43 +01:00
svlandeg	52e85aaf80	Merge branch 'master' into fix/typo	2025-11-02 13:41:13 +01:00
Jing Zhang	ba4f40bf58	Update run1000.sh to add missing --run=$WANDB_RUN	2025-11-01 21:27:00 -07:00
Manuel Saelices	d54c9cbf8c	CPU Support, as bfloat16 params breaks inference	2025-11-01 23:38:50 +01:00
Andrej Karpathy	cf587acb1a	move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts	2025-11-01 16:04:38 +00:00
Andrej Karpathy	7d2c4a3d95	delete pandas dep in base_eval use csv instead	2025-11-01 15:28:30 +00:00
Andrej	ad39db5a23	tiny fix to comment Update engine.py with correct error message on assert	2025-11-01 07:43:57 -07:00
Andrej	630f54ae5a	use empty locals and globals in call to eval() in engine tool use harden eval: prevent the calc tool from accessing globals and locals	2025-11-01 07:22:59 -07:00
Andrej Karpathy	f15732524a	make deepwiki link better	2025-11-01 14:13:29 +00:00
Andrej	dfc88334b6	fix tok/sec calculation bug when grad accum steps > 1 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-30 08:36:32 -07:00

1 2 3

150 Commits