Commit Graph

10 Commits

Author SHA1 Message Date
William Thurston
194c98a5b3 Merge upstream/master (266 commits) into fork
Accept upstream's architectural changes wholesale:
- argparse replaces configurator.py across all scripts
- Unified MuonAdamW optimizer replaces separate AdamW + Muon
- Sliding window attention (SSSL pattern) + Flash Attention 3
- Value embeddings (ResFormer-style) with per-layer gating
- Per-layer learnable scalars (resid_lambdas, x0_lambdas)
- FP8 training support with Float8Linear
- Scaling laws (Power Lines batch sizing, T_epoch weight decay)
- Checkpoint resumption with dataloader state
- BOS-aligned bestfit-pad packing for SFT
- ChatCORE evaluation metric
- Consolidated base_loss.py into base_eval.py
- Removed mid_train.py (pipeline simplified)

Drops our MoE and tie_embeddings implementations in favor of
upstream's cleaner architecture. These can be re-added later
on top of the new codebase if needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 14:50:28 -08:00
Sofie Van Landeghem
72b9064f9d
remove leftover mid references (#491) 2026-02-02 08:33:46 -08:00
Andrej Karpathy
1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1 2026-01-31 19:12:25 +00:00
Sofie Van Landeghem
d4ea28d4e2
Fix args in readme (#438)
* fix commands in readme, using new arg format

* fix typo

* add required -i flag to chat_eval example runs
2026-01-15 16:26:38 -08:00
svlandeg
a2fb3c83a6 fix typos 2025-11-14 11:20:25 +01:00
William Thurston
b1d49aade5 Add scripts for running evaluations and training with W&B integration
- Added `dev/runmps_evals.sh` for evaluating checkpoints and logging results to W&B.
- Introduced `dev/runmps.sh` for orchestrating training stages with W&B support.
- Updated `.gitignore` to include `wandb/` and `.runmps_wandb_ids`.
- Changed permissions for `dev/runcpu.sh` and added executable flag.
- Enhanced existing scripts to log metrics to W&B during training and evaluation processes.
2025-11-05 11:49:50 -08:00
svlandeg
8c9b004c99 typo fixes in scripts 2025-10-28 20:17:31 +01:00
Andrej Karpathy
8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
karpathy
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00