
I recently teamed up with a mate who’s studying Machine Learning to build a Movie Recommendation System. They handled the algorithms (building a clustering model using scikit-learn and pandas), and my job was to wrap it in a robust FastAPI backend and get it deployed to production.
The API worked great locally, but when I finally containerized it with Docker, I hit a massive wall: My Docker image was a bloated 1.56 GB!
For a simple REST API, a gigabyte-and-a-half image is a nightmare for deployment times, server storage costs, and CI/CD pipelines.
I realized the culprit wasn't my code—it was how Python ecosystem tools handle heavy ML dependencies. Here is how I completely overhauled my containerization strategy, migrated to Astral's uv, and shrank my production image down to 656 MB—a nearly 60% reduction.
pip Approach is HeavyMy initial Dockerfile was standard fare. I used python:3.12-slim, copied the requirements.txt, and ran pip install.
While this works, it has three massive flaws when dealing with ML libraries like scipy, numpy, and scikit-learn:
pip Cache: pip caches the massive .whl (wheel) files it downloads. If you don't aggressively clear it out in the same RUN layer, it gets permanently baked into your image.pip's internal logic floating around in my final production container where they serve no purpose.uv and Multi-Stage Buildsuv is the blazing-fast Python package installer and resolver written in Rust. It’s designed as a drop-in replacement for pip, but it brings some incredible modern features.
I ditched requirements.txt for a modern pyproject.toml and a uv.lock file. But the real magic happened in the Dockerfile.
Instead of just replacing pip install with uv pip install, I went all the way and built a Multi-Stage Dockerfile optimized for uv.
Here is the final Dockerfile I landed on:
# --- Stage 1: The Builder ---
FROM python:3.12-slim AS builder
# 1. Grab uv directly from the official image (bypasses pip entirely!)
COPY /uv /uvx /bin/
# 2. Optimize execution and pre-compile Python bytecode
ENV PYTHONUNBUFFERED=1
ENV UV_COMPILE_BYTECODE=1
# 3. Prevent uv from downloading a secondary Python interpreter
ENV UV_PYTHON_DOWNLOADS=0
WORKDIR /app
# 4. Copy dependency files first
COPY pyproject.toml uv.lock ./
# 5. Install dependencies into a local .venv using cache mounts
RUN \
uv sync --frozen --no-dev --no-install-project
# --- Stage 2: The Final Production Image ---
FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1
ENV PATH="/app/.venv/bin:$PATH"
WORKDIR /app
# 6. Surgically extract ONLY the compiled virtual environment
COPY /app/.venv /app/.venv
# 7. Copy the application code and ML models
COPY . /app
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]This setup reduced my image from 1.56 GB to 656 MB while making my container startup instantaneous. Here’s exactly why it works:
pip EntirelyNotice I didn't run pip install uv. Doing that leaves behind pip's internal caches and layers. Instead, I physically copied the uv executable straight from Astral's official container (COPY --from=ghcr.io/astral-sh/uv:latest). Zero cache generated, and uv is ready to go instantly.
I split my build into two stages: a builder and a final image.
In the builder stage, uv downloads and installs all the heavy machine learning libraries into a .venv folder.
In the final stage, I started with a clean python:3.12-slim image and used COPY --from=builder /app/.venv /app/.venv. I left uv, the build tools, and all the temporary download files completely behind! My production image contains only the perfectly built .venv.
--mount=type=cache)When uv downloads packages like scipy, it stores the .whl files in a cache. I used a Docker cache mount (--mount=type=cache,target=/root/.cache/uv) so that these massive files are temporarily stored on the host machine during the build. When the build finishes, Docker detaches the folder. The production image stays tiny, but if I rebuild tomorrow, uv doesn't have to re-download the internet!
UV_COMPILE_BYTECODE=1)Usually, when you start a Python app, it reads your .py files and compiles them into .pyc (bytecode) on the fly, which slows down the API's startup time. By setting this environment variable, I forced uv to pre-compile every single dependency (FastAPI, Pandas, Scikit-learn) into bytecode during the Docker build. When my container starts in production, it skips translation and boots instantly.
UV_PYTHON_DOWNLOADS=0)uv is so smart that if it can't find the right Python version, it will download it for you. But since my base image is already python:3.12-slim, I absolutely don’t want uv stuffing a second copy of Python 3.12 into the container. This flag strictly forces uv to use the system Python runtime.
By taking the time to understand how Python dependencies bloat a container, and by leveraging modern tooling like uv paired with Docker Multi-Stage builds, I completely transformed my deployment.
If you are building Machine Learning APIs with FastAPI, do your DevOps pipeline a favor: ditch pip and adopt uv with multi-stage builds. You won't look back.