nanochat/scripts
RoomWithOutRoof 47e983eea7
fix: use meta device in disable_fp8 to avoid VRAM spike (#616)
When swapping Float8Linear to Linear in disable_fp8 context manager,
using device=fp8_module.weight.device directly allocates new tensors
on GPU, causing unnecessary VRAM spike (~1GB for large models).

This fix uses device='meta' to avoid physical memory allocation,
then swaps in the weight tensor reference. This eliminates the
unnecessary VRAM spike during evaluation phase.

Fixes issue #592

Co-authored-by: RoomWithOutRoof <roomwithoutroof@sparklab.ai>
2026-03-25 14:24:57 -07:00
..
base_eval.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
base_train.py fix: use meta device in disable_fp8 to avoid VRAM spike (#616) 2026-03-25 14:24:57 -07:00
chat_cli.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_eval.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_rl.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_sft.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
chat_web.py delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
tok_eval.py initial commit 2025-10-13 06:49:24 -07:00
tok_train.py quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00