The `speedrun.sh` script was hardcoding `NPROC_PER_NODE=8` if any GPU capability was detected, causing crashes on systems with fewer than 8 GPUs. Additionally, `nanochat/common.py` was autodetecting "cuda" even if `torch.cuda.device_count()` was 0 on some ROCm builds, leading to "invalid device ordinal" errors.
Changes:
- `speedrun.sh`: Dynamically set `NPROC_PER_NODE` using `torch.cuda.device_count()`.
- `nanochat/common.py`: Ensure `autodetect_device_type` only returns "cuda" if devices are actually present.
Uninstalling the conflicting `triton` package (upstream) on AMD systems often removes the `triton` directory shared with `pytorch-triton-rocm`, breaking the latter. This caused `ImportError: cannot import name 'Config' from 'triton'`.
This change adds a step to force reinstall `pytorch-triton-rocm` immediately after uninstalling `triton`, ensuring the correct package is present and intact for the runtime.
On AMD ROCm environments, `uv run` was detecting that the manually uninstalled `triton` package was missing (since it's a transitive dependency of `torch`) and reinstalling it during the tokenizer build step. This caused `ImportError: cannot import name 'Config' from 'triton'` due to conflict with `pytorch-triton-rocm`.
This change adds `--no-sync` to the `uv run` command for building the tokenizer, preventing `uv` from undoing the manual uninstallation of `triton`.
Explicitly uninstall `triton` when AMD GPU is detected.
The standard `triton` package (often pulled by NVIDIA dependencies or accident)
conflicts with `pytorch-triton-rocm` on AMD systems, causing
`ImportError: cannot import name 'Config' from 'triton'`.
This change ensures a clean ROCm environment by removing the conflicting package.
Also retains the `uv run --extra $EXTRAS` fix from the previous step.
Explicitly pass `--extra $EXTRAS` to `uv run` when building the tokenizer.
This prevents `uv` from reverting to the default (NVIDIA) dependency set
during the `maturin` build step, ensuring the correct PyTorch version
(ROCm) is preserved on AMD hardware.