Running nanochat on the Cloud with SkyPilot

This directory contains SkyPilot configurations for easily launching nanochat on major cloud providers (AWS, GCP, Azure), GPU clouds (Lambda, Nebius, RunPod, etc.), and Kubernetes clusters.

Prerequisites

Install SkyPilot and configure it with your cloud provider(s) or Kubernetes cluster:
- Follow the SkyPilot installation guide
- Configure your cloud credentials (AWS, GCP, Azure, Lambda, Nebius, etc.) OR
- Configure Kubernetes access via SkyPilot's Kubernetes support

Training: Running the Speedrun Pipeline

Launch the speedrun training pipeline on any cloud provider with a single command:

sky launch -c nanochat-speedrun cloud/speedrun.sky.yaml --infra <k8s|aws|gcp|nebius|lambda|etc>

This will:

Provision an 8xH100 GPU node
Set up the environment
Run the complete training pipeline via speedrun.sh
Save trained model checkpoints to s3://nanochat-data (change this to your own bucket)
Complete in approximately 4 hours (~$100 on most providers)

Monitoring Training Progress

After launching, you can SSH into the cluster and monitor progress:

# SSH into the cluster
ssh nanochat-speedrun

# View the speedrun logs
sky logs nanochat-speedrun

Serving: Deploy Your Trained Model

Once training is complete, serve your trained model with the web UI:

sky launch -c nanochat-serve cloud/serve.sky.yaml --infra <k8s|aws|gcp|nebius|lambda|etc>

This will:

Provision a 1xH100 GPU node (much cheaper then an 8xH100 VM used for training)
Load model weights from the same s3://nanochat-data bucket used during training
Serve the web chat interface on port 8000
Cost is ~$2-3/hour on most providers

Accessing the Web UI

Get the endpoint URL to access the chat interface:

sky status --endpoint 8000 nanochat-serve

Open the displayed URL in your browser to chat with your trained model!

Shared Storage

Both training and serving tasks use SkyPilot's bucket mounting functionality to preserve and share model weights. This allows you to:

Train once, serve multiple times without re-downloading weights
Share trained models across different serving instances

2.5 KiB Raw Blame History