Commit Graph

11 Commits

Author SHA1 Message Date
google-labs-jules[bot]
8881ea84bf Fix AMD Triton conflict in speedrun.sh
Explicitly uninstall `triton` when AMD GPU is detected.
The standard `triton` package (often pulled by NVIDIA dependencies or accident)
conflicts with `pytorch-triton-rocm` on AMD systems, causing
`ImportError: cannot import name 'Config' from 'triton'`.
This change ensures a clean ROCm environment by removing the conflicting package.
Also retains the `uv run --extra $EXTRAS` fix from the previous step.
2025-11-23 03:38:56 +00:00
google-labs-jules[bot]
83bb650b49 Fix AMD ROCm install regression in speedrun.sh
Explicitly pass `--extra $EXTRAS` to `uv run` when building the tokenizer.
This prevents `uv` from reverting to the default (NVIDIA) dependency set
during the `maturin` build step, ensuring the correct PyTorch version
(ROCm) is preserved on AMD hardware.
2025-11-23 02:33:07 +00:00
google-labs-jules[bot]
083de95913 Fix hardware detection for AMD ROCm and single-process CPU crashes 2025-11-22 23:50:50 +00:00
google-labs-jules[bot]
48e632245e Fix ROCm/APU detection and CPU DDP OOM crash 2025-11-22 09:18:40 +00:00
google-labs-jules[bot]
a35621e726 Fix CPU DDP crashes: Init Gloo backend, prevent OOM by reducing NPROC, add script safety 2025-11-22 05:31:47 +00:00
svlandeg
f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts 2025-11-04 21:36:10 +01:00
Andrej Karpathy
cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts 2025-11-01 16:04:38 +00:00
Luke Stanley
901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-21 23:14:16 +00:00
Andrej Karpathy
fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok 2025-10-21 15:04:58 +00:00
Zach Mueller
f0855cbcc7
Update speedrun.sh 2025-10-14 14:12:01 -04:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00