mirror of
https://github.com/karpathy/nanochat.git
synced 2026-04-01 21:25:21 +00:00
Two rounds of WeCo-guided D12 optimization, validated on D24. Key changes: smaller sliding windows (seq/8), VE every 3rd layer, RoPE 200K, smear removed, exponential residual decay, optimizer buffer pre-allocation. Mean CORE=0.2591 across 3 D24 runs. |
||
|---|---|---|
| .. | ||
| base_eval.py | ||
| base_train.py | ||
| chat_cli.py | ||
| chat_eval.py | ||
| chat_rl.py | ||
| chat_sft.py | ||
| chat_web.py | ||
| tok_eval.py | ||
| tok_train.py | ||