fix: Correct Docker build for rustbpe tokenizer

This commit fixes a build failure in the Docker image by implementing a more robust build process for the `rustbpe` tokenizer.

The `Dockerfile` now explicitly creates a `uv` virtual environment, adds its `bin` directory to the `PATH`, installs `maturin` into the environment, and then runs the `maturin develop` command. This ensures that the build command executes within a fully configured environment with all necessary tools available on the `PATH`, resolving the "No such file or directory" error.
This commit is contained in:
google-labs-jules[bot] 2025-11-04 02:24:08 +00:00
parent fa04262889
commit a88e7ec21f

View File

@ -8,16 +8,24 @@ WORKDIR /app
RUN apt-get update && apt-get install -y curl build-essential
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
# Add uv, cargo, and the future venv bin to the PATH
ENV PATH="/root/.local/bin:/root/.cargo/bin:/app/.venv/bin:${PATH}"
# Copy the entire project into the Docker image.
COPY . .
# Create a virtual environment.
RUN uv venv
# Install Python dependencies using uv.
RUN /root/.local/bin/uv sync --extra gpu
RUN uv sync --extra gpu
# Install maturin, which is a build dependency.
RUN uv pip install maturin
# Build the rustbpe tokenizer.
RUN /root/.local/bin/uv run maturin develop --release --manifest-path rustbpe/Cargo.toml --uv
# The maturin executable from the venv should be on the PATH now.
RUN maturin develop --release --manifest-path rustbpe/Cargo.toml
# Set the entrypoint.
ENTRYPOINT ["python"]