mirror of
https://github.com/karpathy/nanochat.git
synced 2026-03-20 03:43:20 +00:00
Implement weight tying between token embeddings and lm_head to reduce parameter count. When enabled, logits are scaled by 1/√d_model, lm_head zeroing is skipped, and optimizer groups are deduplicated. Param counting uses unique parameters while Chinchilla ratio calculation adds back the would-be lm_head size for comparability. Also adds boolean flag parsing (--flag without =value) to the configurator, an auto-derived log_every interval, and minor shell script fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| gen_synthetic_data.py | ||
| generate_logo.html | ||
| nanochat.png | ||
| repackage_data_reference.py | ||
| runcpu.sh | ||
| runmps_evals.sh | ||
| runmps.sh | ||