This commit fixes a build failure in the Docker image by implementing a more robust build process for the `rustbpe` tokenizer.
The `Dockerfile` now explicitly creates a `uv` virtual environment, adds its `bin` directory to the `PATH`, installs `maturin` into the environment, and then runs the `maturin develop` command. This ensures that the build command executes within a fully configured environment with all necessary tools available on the `PATH`, resolving the "No such file or directory" error.
This commit fixes a build failure in the Docker image by adding the `--uv` flag to the `maturin develop` command.
The `maturin` build process was failing because it could not find `pip` within the `uv` environment. The `--uv` flag ensures that `maturin` correctly uses the `uv` environment to build the `rustbpe` tokenizer.
This commit streamlines the process of running the nanochat pipeline on Vertex AI by using Cloud Build to automate the Docker image creation process.
A `cloudbuild.yaml` file has been added to define the build steps, and a `run_pipeline.sh` script has been created to orchestrate the build and pipeline submission.
The `README.md` has been updated to reflect the new, simplified workflow.
This refactoring enables the nanochat project to be executed as a scalable and robust pipeline on Vertex AI.
The monolithic `speedrun.sh` script has been decomposed into a series of containerized components orchestrated by a Kubeflow pipeline.
The codebase has been updated to use Google Cloud Storage for artifact management, allowing for seamless data sharing between pipeline steps.
A `Dockerfile` and Python wrappers for each pipeline step have been added to the `vertex_pipelines` directory.
The `README.md` has been updated with instructions on how to build the Docker image and run the Vertex AI pipeline.
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like
```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.
I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.