nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-07 01:40:30 +00:00

Author	SHA1	Message	Date
google-labs-jules[bot]	a88e7ec21f	fix: Correct Docker build for rustbpe tokenizer This commit fixes a build failure in the Docker image by implementing a more robust build process for the `rustbpe` tokenizer. The `Dockerfile` now explicitly creates a `uv` virtual environment, adds its `bin` directory to the `PATH`, installs `maturin` into the environment, and then runs the `maturin develop` command. This ensures that the build command executes within a fully configured environment with all necessary tools available on the `PATH`, resolving the "No such file or directory" error.	2025-11-04 02:24:08 +00:00
google-labs-jules[bot]	fa04262889	fix: Correct Docker build for rustbpe tokenizer This commit fixes a build failure in the Docker image by adding the `--uv` flag to the `maturin develop` command. The `maturin` build process was failing because it could not find `pip` within the `uv` environment. The `--uv` flag ensures that `maturin` correctly uses the `uv` environment to build the `rustbpe` tokenizer.	2025-11-04 02:05:34 +00:00
google-labs-jules[bot]	a2189d20d0	feat: Use Cloud Build for Vertex AI pipeline image creation This commit streamlines the process of running the nanochat pipeline on Vertex AI by using Cloud Build to automate the Docker image creation process. A `cloudbuild.yaml` file has been added to define the build steps, and a `run_pipeline.sh` script has been created to orchestrate the build and pipeline submission. The `README.md` has been updated to reflect the new, simplified workflow.	2025-11-04 01:47:20 +00:00
google-labs-jules[bot]	2781d216c6	feat: Refactor nanochat to run on Vertex AI Pipelines This refactoring enables the nanochat project to be executed as a scalable and robust pipeline on Vertex AI. The monolithic `speedrun.sh` script has been decomposed into a series of containerized components orchestrated by a Kubeflow pipeline. The codebase has been updated to use Google Cloud Storage for artifact management, allowing for seamless data sharing between pipeline steps. A `Dockerfile` and Python wrappers for each pipeline step have been added to the `vertex_pipelines` directory. The `README.md` has been updated with instructions on how to build the Docker image and run the Vertex AI pipeline.	2025-11-04 01:26:51 +00:00
Andrej	a83646e098	fix(eval): use UTF-8 when reading CORE JSONL and writing CSV	2025-11-03 06:38:33 -08:00
Andrej	8681922328	fix lstrip bug, make it removeprefix, TIL.	2025-11-03 06:37:48 -08:00
Dipesh Babu	226953b841	fix: open JSONL and results CSV with UTF-8 encoding for portability	2025-11-03 01:20:56 -05:00
Josh Odom	f1e15f5f4d	Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.	2025-11-02 23:40:37 -06:00
Andrej	b6da6982f6	fix nanochat logo: the t was placed too far to the right	2025-11-02 08:17:00 -08:00
Andrej	c2c4f77e22	oops small bugfix to run1000.sh missing kwarg	2025-11-02 08:14:41 -08:00
Andrej	d1ac0b2d07	when loading models on CPU, convert tensors from bfloat16 to float	2025-11-02 07:58:56 -08:00
svlandeg	5bfcd31b73	revert more formatting changes	2025-11-02 14:17:10 +01:00
svlandeg	036a3c5881	revert formatting changes to facilitate review	2025-11-02 14:16:43 +01:00
Jing Zhang	ba4f40bf58	Update run1000.sh to add missing --run=$WANDB_RUN	2025-11-01 21:27:00 -07:00
Manuel Saelices	d54c9cbf8c	CPU Support, as bfloat16 params breaks inference	2025-11-01 23:38:50 +01:00
Andrej Karpathy	cf587acb1a	move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts	2025-11-01 16:04:38 +00:00
Andrej Karpathy	7d2c4a3d95	delete pandas dep in base_eval use csv instead	2025-11-01 15:28:30 +00:00
Andrej	ad39db5a23	tiny fix to comment Update engine.py with correct error message on assert	2025-11-01 07:43:57 -07:00
Andrej	630f54ae5a	use empty locals and globals in call to eval() in engine tool use harden eval: prevent the calc tool from accessing globals and locals	2025-11-01 07:22:59 -07:00
Andrej Karpathy	f15732524a	make deepwiki link better	2025-11-01 14:13:29 +00:00
Andrej	dfc88334b6	fix tok/sec calculation bug when grad accum steps > 1 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-30 08:36:32 -07:00
Andrej	eb11bb0e2e	remove numpy as dep Remove explicit numpy dependency	2025-10-30 08:28:14 -07:00
Andrej	1ccbaf4416	nit delete redundant catch/raise in execute Remove redundant exception handling in chdir	2025-10-29 08:10:03 -07:00
Andrej	29ff38d94b	Merge pull request #35 from bhaskar0210s/master fix: return inf instead of crashing when evaluate_bpb has zero total_bytes	2025-10-29 08:06:24 -07:00
svlandeg	b996131570	Merge branch 'master' into logo/kerning-update	2025-10-29 11:45:40 +01:00
svlandeg	3fa974f93c	few more reverts	2025-10-29 11:45:02 +01:00
svlandeg	cbd560a83d	revert formatting changes to minimize diff and merge conflicts	2025-10-29 11:42:56 +01:00
Andrej	a1de1f46ad	Merge pull request #156 from tlepoint/fix/export-base-dir Export the base dir variable in runcpu.sh	2025-10-28 15:19:08 -07:00
Andrej	ee00f523d0	fixing all the typos to make the pull requests stop Batch of typo fixes	2025-10-28 13:36:07 -07:00
Ajeesh Sunil	5e0987a431	numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list	2025-10-28 20:05:38 +00:00
svlandeg	8c9b004c99	typo fixes in scripts	2025-10-28 20:17:31 +01:00
svlandeg	0a3ce7b0ff	typo fixes in readme	2025-10-28 20:11:00 +01:00
Andrej Karpathy	fdda5826e3	Merge branch 'haowei01-fix_kv_cache_due_to_resize'	2025-10-28 16:54:30 +00:00
Andrej Karpathy	baf0b3fdda	also add a test that failed before the fix and passes now with the fix for kv cache resize	2025-10-28 16:54:17 +00:00
Andrej Karpathy	f1db6b4712	delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm	2025-10-28 16:51:41 +00:00
Andrej Karpathy	9415931f85	delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm	2025-10-28 15:17:43 +00:00
Haowei Zhang	2b9c085559	update the kv_shape	2025-10-27 02:47:13 -07:00
Haowei Zhang	b062b422ac	Fix kv cache, given resize will destroys the logical structure	2025-10-27 02:23:08 -07:00
water-vapor	a9de4b1038	Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-26 01:43:49 -05:00
Andrej Karpathy	c75fe54aa7	readme tweak, link to new discussion and add file structure	2025-10-25 19:39:16 +00:00
Marius Wachtler	fca2b8cd07	harden eval: prevent the calc tool from accessing globals and locals By passing empty globals() and locals() to eval() we can prevent simple malicious cases where the user gets the model to output something like ```<global variable/func> or "a".count("a")``` e.g. ```signal.raise_signal(9) or "a".count("a")``` which would kill the process. or one could maybe get it to output secrets etc. I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.	2025-10-24 14:41:12 -05:00
Andrej Karpathy	05a051dbe9	fix tokenization bug, there should be no space before first letter. sigh	2025-10-24 15:06:06 +00:00
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Andrej Karpathy	81597cd616	move the lr schedule args up in base_train so they are tunable in configurator	2025-10-24 13:27:31 +00:00
Andrej Karpathy	cc3636b01c	allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough	2025-10-24 13:27:05 +00:00
Tancrède Lepoint	d5cda11ab8	Export the base dir variable	2025-10-22 18:15:02 -04:00
Andrej Karpathy	5eeb2b6ef9	experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme	2025-10-22 16:55:54 +00:00
Andrej Karpathy	2dda5c4c8d	Merge branch 'ulanch-fix/ios-safari-input-overlap'	2025-10-22 16:26:35 +00:00
Andrej Karpathy	80b203ea59	also bump run1000.sh to new uv sync	2025-10-22 16:25:36 +00:00
Luke Stanley	917c858136	Updates lockfile with CPU package support without overwriting other architectures	2025-10-22 16:25:36 +00:00

1 2 3

120 Commits