nanochat/scripts
William Thurston 0c942a8c00 Add tie_embeddings support and configurable logging interval
Implement weight tying between token embeddings and lm_head to reduce
parameter count. When enabled, logits are scaled by 1/√d_model, lm_head
zeroing is skipped, and optimizer groups are deduplicated. Param counting
uses unique parameters while Chinchilla ratio calculation adds back the
would-be lm_head size for comparability.

Also adds boolean flag parsing (--flag without =value) to the configurator,
an auto-derived log_every interval, and minor shell script fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 14:42:58 -08:00
..
base_eval.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
base_loss.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
base_train.py Add tie_embeddings support and configurable logging interval 2026-02-22 14:42:58 -08:00
chat_cli.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
chat_eval.py Add scripts for running evaluations and training with W&B integration 2025-11-05 11:49:50 -08:00
chat_rl.py initial commit 2025-10-13 06:49:24 -07:00
chat_sft.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
chat_web.py upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
mid_train.py Enhance model tagging support in training and evaluation scripts 2025-11-10 19:45:02 -08:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py initial commit 2025-10-13 06:49:24 -07:00