Shrinking a Python ML Docker Image by 60% with uv and Multi-Stage Builds

May 7, 20265 min read

I recently teamed up with a mate who’s studying Machine Learning to build a Movie Recommendation System. They handled the algorithms (building a clustering model using scikit-learn and pandas), and my job was to wrap it in a robust FastAPI backend and get it deployed to production.

The API worked great locally, but when I finally containerized it with Docker, I hit a massive wall: My Docker image was a bloated 1.56 GB!

For a simple REST API, a gigabyte-and-a-half image is a nightmare for deployment times, server storage costs, and CI/CD pipelines.

I realized the culprit wasn't my code—it was how Python ecosystem tools handle heavy ML dependencies. Here is how I completely overhauled my containerization strategy, migrated to Astral's uv, and shrank my production image down to 656 MB—a nearly 60% reduction.

The Problem: The Standard `pip` Approach is Heavy

My initial Dockerfile was standard fare. I used python:3.12-slim, copied the requirements.txt, and ran pip install.

While this works, it has three massive flaws when dealing with ML libraries like scipy, numpy, and scikit-learn:

The pip Cache: pip caches the massive .whl (wheel) files it downloads. If you don't aggressively clear it out in the same RUN layer, it gets permanently baked into your image.
Slow Build Times: Installing these dependencies took ages every time I changed my dependencies.
Bloated Base: I had compilation tools and pip's internal logic floating around in my final production container where they serve no purpose.

The Solution: Enter `uv` and Multi-Stage Builds

uv is the blazing-fast Python package installer and resolver written in Rust. It’s designed as a drop-in replacement for pip, but it brings some incredible modern features.

I ditched requirements.txt for a modern pyproject.toml and a uv.lock file. But the real magic happened in the Dockerfile.

Instead of just replacing pip install with uv pip install, I went all the way and built a Multi-Stage Dockerfile optimized for uv.

Here is the final Dockerfile I landed on:

dockerfile

# --- Stage 1: The Builder ---
FROM python:3.12-slim AS builder

# 1. Grab uv directly from the official image (bypasses pip entirely!)
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# 2. Optimize execution and pre-compile Python bytecode
ENV PYTHONUNBUFFERED=1
ENV UV_COMPILE_BYTECODE=1

# 3. Prevent uv from downloading a secondary Python interpreter
ENV UV_PYTHON_DOWNLOADS=0

WORKDIR /app

# 4. Copy dependency files first
COPY pyproject.toml uv.lock ./

# 5. Install dependencies into a local .venv using cache mounts
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev --no-install-project

# --- Stage 2: The Final Production Image ---
FROM python:3.12-slim

ENV PYTHONUNBUFFERED=1
ENV PATH="/app/.venv/bin:$PATH"

WORKDIR /app

# 6. Surgically extract ONLY the compiled virtual environment
COPY --from=builder /app/.venv /app/.venv

# 7. Copy the application code and ML models
COPY . /app

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Deconstructing the Magic

This setup reduced my image from 1.56 GB to 656 MB while making my container startup instantaneous. Here’s exactly why it works:

1. Bypassing `pip` Entirely

Notice I didn't run pip install uv. Doing that leaves behind pip's internal caches and layers. Instead, I physically copied the uv executable straight from Astral's official container (COPY --from=ghcr.io/astral-sh/uv:latest). Zero cache generated, and uv is ready to go instantly.

2. The Multi-Stage Surgical Extraction

I split my build into two stages: a builder and a final image. In the builder stage, uv downloads and installs all the heavy machine learning libraries into a .venv folder. In the final stage, I started with a clean python:3.12-slim image and used COPY --from=builder /app/.venv /app/.venv. I left uv, the build tools, and all the temporary download files completely behind! My production image contains only the perfectly built .venv.

3. Docker Cache Mounts (`--mount=type=cache`)

When uv downloads packages like scipy, it stores the .whl files in a cache. I used a Docker cache mount (--mount=type=cache,target=/root/.cache/uv) so that these massive files are temporarily stored on the host machine during the build. When the build finishes, Docker detaches the folder. The production image stays tiny, but if I rebuild tomorrow, uv doesn't have to re-download the internet!

4. Bytecode Pre-Compilation (`UV_COMPILE_BYTECODE=1`)

Usually, when you start a Python app, it reads your .py files and compiles them into .pyc (bytecode) on the fly, which slows down the API's startup time. By setting this environment variable, I forced uv to pre-compile every single dependency (FastAPI, Pandas, Scikit-learn) into bytecode during the Docker build. When my container starts in production, it skips translation and boots instantly.

5. Preventing Python Duplication (`UV_PYTHON_DOWNLOADS=0`)

uv is so smart that if it can't find the right Python version, it will download it for you. But since my base image is already python:3.12-slim, I absolutely don’t want uv stuffing a second copy of Python 3.12 into the container. This flag strictly forces uv to use the system Python runtime.

The Result

By taking the time to understand how Python dependencies bloat a container, and by leveraging modern tooling like uv paired with Docker Multi-Stage builds, I completely transformed my deployment.

Size: 1.56 GB ⬇️ 656 MB (58% reduction)
Speed: Instant container startup (pre-compiled bytecode)
Developer Experience: Lightning-fast local rebuilds with cache mounts

If you are building Machine Learning APIs with FastAPI, do your DevOps pipeline a favor: ditch pip and adopt uv with multi-stage builds. You won't look back.

Share this article

Share on X Share on LinkedIn

Shrinking a Python ML Docker Image by 60% with uv and Multi-Stage Builds

The Problem: The Standard pip Approach is Heavy

The Solution: Enter uv and Multi-Stage Builds