mirror of
https://github.com/karpathy/nanochat.git
synced 2026-05-08 16:59:59 +00:00
Covers the MPS Metal Graph compiler crash that motivated the fix: adamw_step_fused crashed when p was bf16 (the standard nanochat config for wte/value_embeds) but the optimizer's shared scalar hyperparameters were fp32. Two tests: - test_adamw_step_fused_bf16_param_with_fp32_scalars: smoke test on bf16 path, verifies no crash and a finite weight update. - test_adamw_step_fused_fp32_param_unchanged: confirms fp32 path still produces a sensible update (the dtype-cast patch is a no-op when source dtype matches target). Both tests run on CPU (default) or MPS (when available). Muon's mixed-dtype path is gated on the COMPUTE_DTYPE module constant (set from NANOCHAT_DTYPE env var at import time), which is awkward to exercise in a unit test without subprocess; the muon fix is covered by manual end-to-end testing on M2 + bf16 instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| test_attention_fallback.py | ||
| test_engine.py | ||
| test_optim_bf16.py | ||