Use requests instead of urllib for downloads, fixes SSL on macOS

The single download path in nanochat/common.py uses urllib.request,
which on macOS python.org installs (and some other Python distributions)
relies on Python's bundled CA cert store. That store is empty until the
user runs the post-install "Install Certificates.command" script, so
HTTPS downloads fail with `SSL: CERTIFICATE_VERIFY_FAILED` against
karpathy-public.s3.us-west-2.amazonaws.com (and any HTTPS endpoint).

This breaks runs/runcpu.sh for fresh macOS Python installs:
  - scripts.base_eval can't fetch eval_bundle.zip
  - scripts.chat_sft can't fetch identity_conversations.jsonl

requests uses certifi by default, which ships its own CA bundle, so
this works on every supported platform without any per-machine cert
configuration. requests is already a transitive dep used in
nanochat/dataset.py, so this introduces no new dependency.

raise_for_status() is added so HTTP errors fail loudly instead of
silently writing the response body of an error page. A 60-second
timeout is added; the previous urlopen call had no timeout.
This commit is contained in:
Modestas Valauskas 2026-04-30 18:59:11 +02:00
parent 0aaca56805
commit f55588946c

View File

@ -5,7 +5,7 @@ Common utilities for nanochat.
import os
import re
import logging
import urllib.request
import requests
import torch
import torch.distributed as dist
from filelock import FileLock
@ -100,8 +100,9 @@ def download_file_with_lock(url, filename, postprocess_fn=None):
# Download the content as bytes
print(f"Downloading {url}...")
with urllib.request.urlopen(url) as response:
content = response.read() # bytes
response = requests.get(url, timeout=60)
response.raise_for_status()
content = response.content # bytes
# Write to local file
with open(file_path, 'wb') as f: