nanochat/tests
Matt Parrett 45aa6e2de2 tests: regression test for adamw_step_fused with bf16 params + fp32 scalars
Covers the MPS Metal Graph compiler crash that motivated the fix:
adamw_step_fused crashed when p was bf16 (the standard nanochat
config for wte/value_embeds) but the optimizer's shared scalar
hyperparameters were fp32. Two tests:

- test_adamw_step_fused_bf16_param_with_fp32_scalars: smoke test
  on bf16 path, verifies no crash and a finite weight update.
- test_adamw_step_fused_fp32_param_unchanged: confirms fp32 path
  still produces a sensible update (the dtype-cast patch is a
  no-op when source dtype matches target).

Both tests run on CPU (default) or MPS (when available). Muon's
mixed-dtype path is gated on the COMPUTE_DTYPE module constant
(set from NANOCHAT_DTYPE env var at import time), which is
awkward to exercise in a unit test without subprocess; the muon
fix is covered by manual end-to-end testing on M2 + bf16 instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:19:36 -07:00
..
test_attention_fallback.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
test_engine.py Fix MockModel's device definition (#535) 2026-02-17 16:03:46 -08:00
test_optim_bf16.py tests: regression test for adamw_step_fused with bf16 params + fp32 scalars 2026-04-30 20:19:36 -07:00