nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-20 20:03:19 +00:00

History

William Thurston 194c98a5b3 Merge upstream/master (266 commits) into fork Accept upstream's architectural changes wholesale: - argparse replaces configurator.py across all scripts - Unified MuonAdamW optimizer replaces separate AdamW + Muon - Sliding window attention (SSSL pattern) + Flash Attention 3 - Value embeddings (ResFormer-style) with per-layer gating - Per-layer learnable scalars (resid_lambdas, x0_lambdas) - FP8 training support with Float8Linear - Scaling laws (Power Lines batch sizing, T_epoch weight decay) - Checkpoint resumption with dataloader state - BOS-aligned bestfit-pad packing for SFT - ChatCORE evaluation metric - Consolidated base_loss.py into base_eval.py - Removed mid_train.py (pipeline simplified) Drops our MoE and tie_embeddings implementations in favor of upstream's cleaner architecture. These can be re-added later on top of the new codebase if needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-02-22 14:50:28 -08:00
..
estimate_gpt3_core.ipynb	add notebook on deriving the CORE estimates for the GPT-3 miniseries.	2026-01-05 18:40:28 +00:00
gen_synthetic_data.py	tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3	2026-02-02 01:45:59 +00:00
generate_logo.html	initial commit	2025-10-13 06:49:24 -07:00
LEADERBOARD.md	new leaderboard record via new auto-calculated optimal batch size. for d26 it is 1M, up from 0.5M that was default earlier	2026-02-05 20:11:32 +00:00
LOG.md	docs: fix typos in experiment log (#547 )	2026-02-20 08:03:45 -08:00
nanochat.png	Update logo	2025-10-14 14:19:44 -04:00
repackage_data_reference.py	initial commit	2025-10-13 06:49:24 -07:00
runmps_evals.sh	Add scripts for running evaluations and training with W&B integration	2025-11-05 11:49:50 -08:00
runmps.sh	Add tie_embeddings support and configurable logging interval	2026-02-22 14:42:58 -08:00
scaling_analysis.ipynb	add engram-lite, add log, tune scaling laws analysis scripts	2026-01-27 22:31:17 +00:00
scaling_laws_jan26.png	nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1	2026-01-31 19:12:25 +00:00