nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-15 10:39:08 +00:00

History

Yan Meng 741e54f360 perf: architecture + optimizer optimizations — 94.6 min to GPT-2 (4.3% speedup) Two rounds of WeCo-guided D12 optimization, validated on D24. Key changes: smaller sliding windows (seq/8), VE every 3rd layer, RoPE 200K, smear removed, exponential residual decay, optimizer buffer pre-allocation. Mean CORE=0.2591 across 3 D24 runs.		2026-03-19 23:15:26 +00:00
..
base_eval.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
base_train.py	perf: architecture + optimizer optimizations — 94.6 min to GPT-2 (4.3% speedup)	2026-03-19 23:15:26 +00:00
chat_cli.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_eval.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_rl.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_sft.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
chat_web.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00