Commit Graph

9 Commits

Author SHA1 Message Date
Lawrence R Kincheloe III
1bbba0f0d3
Merge branch 'master' into rocm-support 2025-10-16 18:47:37 -05:00
Zach Mueller
f0855cbcc7
Update speedrun.sh 2025-10-14 14:12:01 -04:00
google-labs-jules[bot]
f5349ffaea fix: Re-add PYTORCH_CUDA_ALLOC_CONF to training scripts
This commit re-adds the `PYTORCH_CUDA_ALLOC_CONF` environment variable to the training scripts. This setting helps prevent memory fragmentation and is beneficial for both CUDA and ROCm environments. This change was inadvertently removed during a previous refactoring.
2025-10-14 15:20:54 +00:00
Lawrence R Kincheloe III
b3f662a924
Merge branch 'rocm-support' into fix/pytorch-memory-fragmentation 2025-10-14 02:11:50 -05:00
google-labs-jules[bot]
b09f7fc29b Set PYTORCH_CUDA_ALLOC_CONF to prevent memory fragmentation
This change adds the `PYTORCH_CUDA_ALLOC_CONF` environment variable to the main `speedrun.sh` execution script.

Setting `expandable_segments:True` is recommended by PyTorch to manage memory more efficiently and prevent fragmentation, addressing a `UserWarning` observed during execution.
2025-10-14 06:57:18 +00:00
google-labs-jules[bot]
5a785854d1 feat: Add HSA_OVERRIDE_GFX_VERSION for newer AMD GPUs
This commit adds the `HSA_OVERRIDE_GFX_VERSION` environment variable to the `speedrun.sh` script. This is a workaround to enable support for newer AMD GPU architectures (e.g., gfx1151) that are not yet officially supported in the pre-compiled PyTorch ROCm builds.

This change also includes an update to the `README.md` to explain this workaround to users.
2025-10-14 06:48:34 +00:00
google-labs-jules[bot]
054d903cae fix: Address runtime errors and improve configuration
This commit addresses several runtime errors encountered during the execution of the `speedrun.sh` script and improves the overall configuration of the project.

The key changes are:
- Patched `nanochat/configurator.py` to be more robust by handling flag-like arguments and ignoring unknown arguments. This resolves the `AssertionError`.
- Fixed the argument handling for `chat_eval.py` in `speedrun.sh` to prevent argument parsing errors.
- Updated `pyproject.toml` to correctly define optional dependencies for development.
2025-10-14 05:47:26 +00:00
google-labs-jules[bot]
08c628cb83 feat: Add ROCm and device-agnostic support
This change adds support for ROCm and makes the codebase device-agnostic, allowing it to run on different hardware backends including ROCm, CUDA, and CPU.

The key changes are:
- Modified `pyproject.toml` to use ROCm-compatible PyTorch wheels and added the `pytorch-triton-rocm` dependency.
- Refactored `nanochat/common.py` to dynamically detect the available hardware and set the device and distributed backend accordingly.
- Updated all training, evaluation, and inference scripts to be device-agnostic, removing hardcoded CUDA references.
- Adapted `speedrun.sh` for single-device execution by replacing `torchrun` with `python`.
- Updated `nanochat/report.py` to provide more generic GPU information.
2025-10-14 05:07:30 +00:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00