From a2189d20d06da2bc3b43bdd2c3887895aac8a21b Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Tue, 4 Nov 2025 01:47:20 +0000 Subject: [PATCH] feat: Use Cloud Build for Vertex AI pipeline image creation This commit streamlines the process of running the nanochat pipeline on Vertex AI by using Cloud Build to automate the Docker image creation process. A `cloudbuild.yaml` file has been added to define the build steps, and a `run_pipeline.sh` script has been created to orchestrate the build and pipeline submission. The `README.md` has been updated to reflect the new, simplified workflow. --- README.md | 61 +++++++++++--------------------- vertex_pipelines/cloudbuild.yaml | 11 ++++++ vertex_pipelines/run_pipeline.sh | 39 ++++++++++++++++++++ 3 files changed, 70 insertions(+), 41 deletions(-) create mode 100644 vertex_pipelines/cloudbuild.yaml create mode 100644 vertex_pipelines/run_pipeline.sh diff --git a/README.md b/README.md index e9aeb95..9ca431c 100644 --- a/README.md +++ b/README.md @@ -220,9 +220,9 @@ If you find nano.sh helpful in your research cite simply as: MIT -## Running on Vertex AI Pipelines +## Running on Vertex AI with Cloud Build -This project can also be run on Vertex AI Pipelines, which allows for a more robust and scalable execution environment. The following steps will guide you through the process of setting up and running the nanochat pipeline on Vertex AI. +This project can be run on Vertex AI Pipelines using Cloud Build to automate the Docker image creation process. ### Prerequisites @@ -231,48 +231,27 @@ Before you begin, you will need the following: * A Google Cloud Platform (GCP) project. * A Google Cloud Storage (GCS) bucket. * The `gcloud` command-line tool installed and configured. -* Docker installed and configured to push to Google Container Registry (GCR). - -### Building and Pushing the Docker Image - -The first step is to build the Docker image that will be used to run the pipeline components. This image contains all of the necessary dependencies, including Python, `uv`, Rust, and the `nanochat` source code. - -1. **Set your GCP Project ID:** - - ```bash - export GCP_PROJECT=$(gcloud config get-value project) - ``` - -2. **Authenticate with GCR:** - - ```bash - gcloud auth configure-docker - ``` - -3. **Build the Docker image:** - - ```bash - docker build -t gcr.io/${GCP_PROJECT}/nanochat:latest -f vertex_pipelines/Dockerfile . - ``` - -4. **Push the Docker image to GCR:** - - ```bash - docker push gcr.io/${GCP_PROJECT}/nanochat:latest - ``` +* The Cloud Build API enabled in your GCP project. ### Running the Pipeline -Once the Docker image has been pushed to GCR, you can run the pipeline using the `vertex_pipelines/pipeline.py` script. This script will compile the pipeline and submit it to Vertex AI for execution. +The `run_pipeline.sh` script automates the process of building the Docker image with Cloud Build and submitting the pipeline to Vertex AI. -```bash -python vertex_pipelines/pipeline.py \ - --gcp-project ${GCP_PROJECT} \ - --gcs-bucket gs://YOUR_GCS_BUCKET \ - --pipeline-root gs://YOUR_GCS_BUCKET/pipeline-root \ - --docker-image-uri gcr.io/${GCP_PROJECT}/nanochat:latest -``` +1. **Make the script executable:** -Replace `gs://YOUR_GCS_BUCKET` with your GCS bucket. The `pipeline-root` is a path in your GCS bucket where the pipeline artifacts will be stored. + ```bash + chmod +x vertex_pipelines/run_pipeline.sh + ``` -The pipeline will then start running on Vertex AI, and you can monitor its progress in the GCP console. +2. **Run the script:** + + ```bash + ./vertex_pipelines/run_pipeline.sh gs://YOUR_GCS_BUCKET + ``` + + Replace `gs://YOUR_GCS_BUCKET` with the path to your GCS bucket. The script will: + * Submit a build to Google Cloud Build using the `vertex_pipelines/cloudbuild.yaml` configuration. + * Once the build is complete, it will retrieve the URI of the newly created Docker image. + * Submit the pipeline to Vertex AI, passing the image URI and GCS bucket path as parameters. + +You can monitor the progress of both the Cloud Build job and the Vertex AI Pipeline run in the GCP console. diff --git a/vertex_pipelines/cloudbuild.yaml b/vertex_pipelines/cloudbuild.yaml new file mode 100644 index 0000000..2605c5c --- /dev/null +++ b/vertex_pipelines/cloudbuild.yaml @@ -0,0 +1,11 @@ +steps: +- name: 'gcr.io/cloud-builders/docker' + args: + - 'build' + - '-t' + - 'gcr.io/$PROJECT_ID/nanochat:latest' + - '.' + - '-f' + - 'vertex_pipelines/Dockerfile' +images: +- 'gcr.io/$PROJECT_ID/nanochat:latest' diff --git a/vertex_pipelines/run_pipeline.sh b/vertex_pipelines/run_pipeline.sh new file mode 100644 index 0000000..5c9cd0f --- /dev/null +++ b/vertex_pipelines/run_pipeline.sh @@ -0,0 +1,39 @@ +#!/bin/bash +set -euo pipefail + +# Check for required arguments +if [ "$#" -ne 1 ]; then + echo "Usage: $0 gs://YOUR_GCS_BUCKET" + exit 1 +fi + +if [[ ! "$1" =~ ^gs:// ]]; then + echo "Error: GCS bucket must be a valid gs:// path." + echo "Usage: $0 gs://YOUR_GCS_BUCKET" + exit 1 +fi + +GCS_BUCKET=$1 +PIPELINE_ROOT="$GCS_BUCKET/pipeline-root" +GCP_PROJECT=$(gcloud config get-value project) +REGION="us-central1" + +echo "Using GCP Project: $GCP_PROJECT" +echo "Using GCS Bucket: $GCS_BUCKET" +echo "Using Region: $REGION" + +# Submit the build to Cloud Build and get the image URI with digest +echo "Submitting build to Cloud Build..." +IMAGE_URI=$(gcloud builds submit --config vertex_pipelines/cloudbuild.yaml --format="value(results.images[0].name)" . --project=$GCP_PROJECT) +echo "Cloud Build completed. Using image URI: $IMAGE_URI" + +# Run the Vertex AI pipeline +echo "Running Vertex AI pipeline..." +python vertex_pipelines/pipeline.py \ + --gcp-project "$GCP_PROJECT" \ + --gcs-bucket "$GCS_BUCKET" \ + --pipeline-root "$PIPELINE_ROOT" \ + --docker-image-uri "$IMAGE_URI" \ + --region "$REGION" + +echo "Pipeline submitted."