The GPU Isolation Problem
You just deployed a powerful Dedicated Server with an NVIDIA RTX or A100 GPU. You install Docker, pull an AI container, and type docker run. But instantly, you are hit with a wall of errors: "No devices found" or "CUDA not available". Why?
By design, Docker containers are isolated from the host machine's hardware. A standard Docker installation has absolutely no idea that a GPU exists on your motherboard. To enable GPU Passthrough, you need a specialized bridge.
Enter the NVIDIA Container Toolkit. This critical piece of software acts as a translator between the Docker engine (via daemon.json) and your underlying CUDA Toolkit. In this guide, tailored specifically for Ubuntu 24.04 (Noble Numbat), we will show you exactly how to break the isolation barrier and give your containers full, bare-metal GPU access.
Step 0: The Bare Metal Prerequisite
Before touching Docker, your host machine must actually recognize the GPU. The NVIDIA Container Toolkit does not install GPU drivers; it only passes them through. Let's verify your foundation.
If this command returns a table showing your GPU model, CUDA version, and VRAM usage, you are ready to proceed. If it says "command not found", you must install the proprietary NVIDIA drivers first.
Step 1: Escape the Docker "Snap" Trap
Here is the #1 reason why developers fail to enable GPU passthrough on Ubuntu 24.04: The Snap Store. If you installed Docker via the Ubuntu App Center or using sudo snap install docker, the NVIDIA Container Toolkit will not work.
Snap packages are heavily sandboxed (confined). They restrict Docker from accessing the host's /dev/nvidia* device files, resulting in permission denied errors. We must purge the snap version and install the official Docker APT package.
Remove Snap Docker
sudo snap remove --purge docker
sudo apt-get remove docker docker-engine docker.io containerd runc
Now, install the official Docker Engine directly from Docker's repository:
# Add Docker's official GPG key
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install the latest Docker Engine
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
Step 2: Install the NVIDIA Container Toolkit
With a clean, unconfined Docker engine running, we can now install the toolkit. This process involves fetching NVIDIA's GPG key, adding their stable repository, and installing the core packages.
Run these commands carefully to configure the production repository on Ubuntu 24.04:
# 1. Download the GPG key and configure the repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# 2. Update your package list
sudo apt-get update
# 3. Install the toolkit
sudo apt-get install -y nvidia-container-toolkit
Step 3: Configure daemon.json (The Bridge)
The software is installed, but Docker still doesn't know about it. We need to inject the NVIDIA runtime into Docker's brainβthe /etc/docker/daemon.json file.
NVIDIA provides a built-in CLI tool (nvidia-ctk) to do this automatically. Run the following command:
sudo nvidia-ctk runtime configure --runtime=docker
Behind the scenes, this command registers nvidia as a valid container runtime. For the changes to take effect, you must restart the Docker daemon:
sudo systemctl restart docker
Pro-Tip: Verify the JSON File
If you want to be 100% sure the bridge was created, run cat /etc/docker/daemon.json. You should see a JSON block defining the runtimes key with the path to the nvidia-container-runtime executable.
Step 4: The Victory Verification
It is time for the moment of truth. We will pull an official NVIDIA CUDA base image and attempt to run the nvidia-smi command inside the isolated Docker container.
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu24.04 nvidia-smi
If your configuration is correct, Docker will download the image and output the familiar NVIDIA-SMI table. Notice that this table is being generated from inside the container, proving that the GPU passthrough is 100% functional!
π Congratulations! You are now a GPU Hero.
Your Ubuntu 24.04 server is fully equipped to deploy large language models (LLMs), Stable Diffusion, or any high-performance AI container.
Bonus: Enabling GPU in Docker Compose
While docker run --gpus all is great for testing, real-world enterprise deployments use Docker Compose. You cannot simply write --gpus all in a YAML file. Instead, you must use the deploy specification.
Here is a production-ready template showing how to assign an NVIDIA GPU to an AI container (like Ollama) using docker-compose.yml:
services:
ai-agent:
image: ollama/ollama:latest
container_name: private-ai-brain
restart: always
ports:
- "11434:11434"
volumes:
- ./ollama_data:/root/.ollama
# THIS IS THE MAGIC BLOCK FOR GPU PASSTHROUGH
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1 # Number of GPUs to pass (use 'all' for multi-GPU)
capabilities: [gpu]
Simply run docker compose up -d, and your container will boot with direct hardware access.
Troubleshooting: Fixing Common GPU Docker Errors
Did something go wrong? Don't panic. Here are the solutions to the top 2 errors users face on Ubuntu 24.04:
- Error: "docker: Error response from daemon: could not select device driver with capabilities: [[gpu]]"
Fix: Docker cannot find the nvidia-container-runtime. You likely skipped configuring the daemon.json or forgot to restart Docker. Run:
sudo nvidia-ctk runtime configure --runtime=docker followed by sudo systemctl restart docker.
- Error: "Failed to initialize NVML: Unknown Error"
Fix: This usually means your host's NVIDIA drivers updated in the background, but the kernel module didn't reload. A simple server reboot (sudo reboot) fixes this 99% of the time.
Hardware Meets Software
Now that your Docker environment is fully weaponized for AI, it's time to build something incredible. Why rent expensive cloud AI APIs when you can run sovereign models on Bare Metal?