nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2025-12-06 04:12:13 +00:00

Author	SHA1	Message	Date
svlandeg	52e85aaf80	Merge branch 'master' into fix/typo	2025-11-02 13:41:13 +01:00
Andrej Karpathy	cf587acb1a	move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts	2025-11-01 16:04:38 +00:00
Andrej Karpathy	7d2c4a3d95	delete pandas dep in base_eval use csv instead	2025-11-01 15:28:30 +00:00
Andrej	ad39db5a23	tiny fix to comment Update engine.py with correct error message on assert	2025-11-01 07:43:57 -07:00
Andrej	630f54ae5a	use empty locals and globals in call to eval() in engine tool use harden eval: prevent the calc tool from accessing globals and locals	2025-11-01 07:22:59 -07:00
Andrej Karpathy	f15732524a	make deepwiki link better	2025-11-01 14:13:29 +00:00
Andrej	dfc88334b6	fix tok/sec calculation bug when grad accum steps > 1 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-30 08:36:32 -07:00
Andrej	eb11bb0e2e	remove numpy as dep Remove explicit numpy dependency	2025-10-30 08:28:14 -07:00
svlandeg	70319851fc	fix typo	2025-10-29 19:48:34 +01:00
Andrej	1ccbaf4416	nit delete redundant catch/raise in execute Remove redundant exception handling in chdir	2025-10-29 08:10:03 -07:00
Andrej	29ff38d94b	Merge pull request #35 from bhaskar0210s/master fix: return inf instead of crashing when evaluate_bpb has zero total_bytes	2025-10-29 08:06:24 -07:00
Andrej	a1de1f46ad	Merge pull request #156 from tlepoint/fix/export-base-dir Export the base dir variable in runcpu.sh	2025-10-28 15:19:08 -07:00
Andrej	ee00f523d0	fixing all the typos to make the pull requests stop Batch of typo fixes	2025-10-28 13:36:07 -07:00
Ajeesh Sunil	5e0987a431	numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list	2025-10-28 20:05:38 +00:00
svlandeg	8c9b004c99	typo fixes in scripts	2025-10-28 20:17:31 +01:00
svlandeg	0a3ce7b0ff	typo fixes in readme	2025-10-28 20:11:00 +01:00
Andrej Karpathy	fdda5826e3	Merge branch 'haowei01-fix_kv_cache_due_to_resize'	2025-10-28 16:54:30 +00:00
Andrej Karpathy	baf0b3fdda	also add a test that failed before the fix and passes now with the fix for kv cache resize	2025-10-28 16:54:17 +00:00
Andrej Karpathy	f1db6b4712	delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm	2025-10-28 16:51:41 +00:00
Andrej Karpathy	9415931f85	delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm	2025-10-28 15:17:43 +00:00
Haowei Zhang	2b9c085559	update the kv_shape	2025-10-27 02:47:13 -07:00
Haowei Zhang	b062b422ac	Fix kv cache, given resize will destroys the logical structure	2025-10-27 02:23:08 -07:00
water-vapor	a9de4b1038	Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-26 01:43:49 -05:00
Andrej Karpathy	c75fe54aa7	readme tweak, link to new discussion and add file structure	2025-10-25 19:39:16 +00:00
Marius Wachtler	fca2b8cd07	harden eval: prevent the calc tool from accessing globals and locals By passing empty globals() and locals() to eval() we can prevent simple malicious cases where the user gets the model to output something like ```<global variable/func> or "a".count("a")``` e.g. ```signal.raise_signal(9) or "a".count("a")``` which would kill the process. or one could maybe get it to output secrets etc. I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.	2025-10-24 14:41:12 -05:00
Andrej Karpathy	05a051dbe9	fix tokenization bug, there should be no space before first letter. sigh	2025-10-24 15:06:06 +00:00
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Andrej Karpathy	81597cd616	move the lr schedule args up in base_train so they are tunable in configurator	2025-10-24 13:27:31 +00:00
Andrej Karpathy	cc3636b01c	allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough	2025-10-24 13:27:05 +00:00
Tancrède Lepoint	d5cda11ab8	Export the base dir variable	2025-10-22 18:15:02 -04:00
Andrej Karpathy	5eeb2b6ef9	experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme	2025-10-22 16:55:54 +00:00
Andrej Karpathy	2dda5c4c8d	Merge branch 'ulanch-fix/ios-safari-input-overlap'	2025-10-22 16:26:35 +00:00
Andrej Karpathy	80b203ea59	also bump run1000.sh to new uv sync	2025-10-22 16:25:36 +00:00
Luke Stanley	917c858136	Updates lockfile with CPU package support without overwriting other architectures	2025-10-22 16:25:36 +00:00
Luke Stanley	db1d5b595d	Git ignore eval_bundle	2025-10-22 16:25:36 +00:00
Luke Stanley	dd9387b362	Fix GPU-less CPU use on Linux with specific Torch indexes	2025-10-22 16:25:36 +00:00
Luke Stanley	32571664b1	Fix Torch crash caused by pinning on CPU	2025-10-22 16:25:36 +00:00
Andrej Karpathy	51e70f0d3c	Merge branch 'lukestanley-fix-cpu-support-with-extras'	2025-10-22 16:11:15 +00:00
Andrej Karpathy	48387cd895	also bump run1000.sh to new uv sync	2025-10-22 16:08:31 +00:00
ulanch	796f84527f	fix(ui): prevent iOS Safari toolbar from covering input on initial load	2025-10-21 17:34:40 -07:00
Luke Stanley	7a52f9bfbb	Updates lockfile with CPU package support without overwriting other architectures	2025-10-21 23:14:34 +00:00
Luke Stanley	760af62e11	Git ignore eval_bundle	2025-10-21 23:14:34 +00:00
Luke Stanley	901b075605	Fix GPU-less CPU use on Linux with specific Torch indexes	2025-10-21 23:14:16 +00:00
Luke Stanley	defd1246aa	Fix Torch crash caused by pinning on CPU	2025-10-21 20:28:10 +00:00
Andrej	2e938530ce	delete spurious torch.empty allocation in adamw fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-21 11:35:17 -07:00
Andrej Karpathy	a088b7a6ec	use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available	2025-10-21 18:07:33 +00:00
Andrej Karpathy	94ee507054	quick fix base eval due to fewshot requirement	2025-10-21 17:56:08 +00:00
Andrej	33e8a27f91	Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something. add cpu\|mps support	2025-10-21 10:26:04 -07:00
Andrej Karpathy	50bea28ef9	also add readme mention of the cpu mps changes	2025-10-21 17:24:48 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00

1 2 3

102 Commits