NVIDIA GPU Monitoring: Prometheus & Grafana Docker Guide

The Silent Killers of AI Servers

Let's be brutally honest: Standard server monitoring tools (like Node Exporter or htop) are completely blind to your GPUs. If you are running LLMs (like vLLM, Ollama) or training AI models, you are pushing your hardware to the absolute edge. Without deep GPU visibility, you will inevitably face three silent killers:

OOM (Out of Memory) Errors: Your AI agent receives a massive context window, VRAM spikes to 100%, and the server process crashes instantly without warning.
Thermal Throttling: Your GPU hits 90°C. To protect itself, it drops clock speeds drastically, turning your expensive H100 into a slow heater.
Power Limit Drops: Unstable power draw causes random latency spikes during inference.

Crucial Prerequisite: The NVIDIA Container Toolkit

Before running Docker Compose, having NVIDIA drivers is not enough. Your Docker daemon must know how to talk to the GPU. You must install the NVIDIA Container Toolkit and configure the runtime. If you skip this, your dcgm-exporter container will instantly crash.

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Phase 1: The Architecture

To build a professional monitoring stack, we need three components working in harmony:

NVIDIA DCGM Exporter: This is the official agent from NVIDIA. It talks directly to the GPU hardware and exposes metrics (like VRAM usage, PCIe bandwidth, and temperature). (Note: Do not use the deprecated 'nvidia_gpu_exporter').
Prometheus: The time-series database. It "scrapes" (downloads) the metrics from the DCGM exporter every few seconds and stores them securely.
Grafana: The visualizer. It connects to Prometheus and turns raw numbers into beautiful, easy-to-read speedometers and graphs.

Prerequisites: You must have NVIDIA Drivers & the NVIDIA Container Toolkit installed on your host machine before proceeding.

Step 2: Clean Directory Structure

Many tutorials tell you to mount messy, random folders. Let's do this the clean way so your data persists even if the server reboots.

# Create the main directory
mkdir -p ~/gpu-monitoring
cd ~/gpu-monitoring

# Create sub-directories for persistent data
mkdir -p prometheus_data grafana_data prometheus_config

# Set permissions for Grafana (Requires ID 472)
sudo chown -R 472:472 grafana_data

Step 3: Prometheus Configuration

We need to tell Prometheus exactly where to find the GPU metrics.

nano prometheus_config/prometheus.yml

Paste the following configuration into the file:

global:
  scrape_interval: 15s # How often to fetch data
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # This is where Prometheus finds our GPU Data
  - job_name: 'dcgm-exporter'
    static_configs:
      - targets: ['dcgm-exporter:9400']

Save and exit (Ctrl+X, Y, Enter).

Step 4: The Docker Compose Magic

Now we deploy the entire stack using a single file. Pay close attention to the warning below regarding the DCGM interval!

CRITICAL WARNING: The Disk Bloat Trap

Many online guides mistakenly tell you to set DCGM_EXPORTER_INTERVAL=30. Do not do this! The interval is measured in milliseconds. Setting it to 30 means it will scrape data every 30ms, which will completely fill up your server's hard drive with useless logs in a matter of days.

The correct setting for production is 30000 (30 seconds) or 15000 (15 seconds).

nano docker-compose.yml

Paste the following bulletproof configuration.
(Note: You can also find this code in our Official ServerMO GitHub Repository).

networks:
  monitor-net:
    driver: bridge

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus_config/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d' # Keep data for 15 days
    restart: unless-stopped
    ports:
      - "9090:9090"
    networks:
      - monitor-net

  dcgm-exporter:
    image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.0-ubuntu22.04
    container_name: dcgm-exporter
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - DCGM_EXPORTER_INTERVAL=15000 # 15 Seconds (SAFE)
    cap_add:
      - SYS_ADMIN
    restart: unless-stopped
    ports:
      - "9400:9400"
    networks:
      - monitor-net

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    volumes:
      - ./grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
    restart: unless-stopped
    ports:
      - "3000:3000"
    networks:
      - monitor-net
    depends_on:
      - prometheus

Save the file, and fire up the stack!

docker compose up -d

Step 5: Visualizing in Grafana

Your metrics are flowing. Now let's make them look good.

Open your web browser and go to http://YOUR_SERVER_IP:3000.
Log in with username: admin and password: admin (You will be prompted to change this).
Add Data Source: Go to Connections > Data Sources > Add data source. Select Prometheus.
In the Connection URL field, type exactly: http://prometheus:9090. Scroll down and click Save & Test.
Import Dashboard: Go to Dashboards > Import. Don't blindly use old dashboard IDs from 2021 (like 12239), as they often show "No Data" with the latest DCGM v3.3+ metrics. Instead, download the updated JSON file directly from our Official ServerMO GitHub Repository.
Select "Upload JSON file", choose the file you downloaded, select your "Prometheus" data source from the dropdown, and click Import.

Boom! You now have real-time visibility into your GPU's VRAM usage, Power Draw, Temperature, and PCIe bandwidth.

Conclusion: Is Your GPU Bottlenecking You?

Now that you can see your metrics, you might discover an uncomfortable truth: Your VRAM is constantly hitting 99%, and your AI inference is crawling.

The Struggle (Low VRAM)

Consumer GPUs (24GB)

OOM Crashes on Large Prompts
Cannot fit 70B+ Parameter Models
Thermal Throttling under load

The ServerMO Solution

Data Center GPUs (80GB+)

Massive VRAM (H100 / A100)
Bare Metal Stability (No Throttling)
Instant Inference Speeds

Don't let hardware limitations choke your AI development.

If your Grafana dashboard is flashing red, it's time to upgrade. Rent a dedicated, unthrottled GPU server built for Enterprise AI workloads.

Ready for unlimited VRAM?

Deploy a High-VRAM GPU Server

Monitoring FAQ

Why shouldn't I use the old nvidia_gpu_exporter?

The old exporter is deprecated and relies on parsing nvidia-smi output, which is slow and consumes CPU. The DCGM Exporter is the official tool from NVIDIA, communicating directly with the hardware APIs for zero-overhead monitoring.

My DCGM Exporter container keeps restarting. Why?

This usually happens if the NVIDIA Container Toolkit is not installed or configured correctly on your host OS. Ensure you have run sudo nvidia-ctk runtime configure --runtime=docker and restarted the Docker daemon.

What is a dangerous GPU temperature?

For Data Center GPUs (like H100 or A100), temperatures above 85°C will trigger thermal throttling, reducing your AI inference speed. If it hits 90°C+, the hardware may shut down. While server room cooling plays a role, pushing continuous, unoptimized massive batch inference (like 24/7 LLM training) without load balancing is usually the main culprit. ServerMO's bare-metal servers provide enterprise-grade cooling, but using Grafana to monitor and optimize your workload distribution is essential.

How to Monitor NVIDIA GPUs (VRAM, Power, Temp) using Prometheus & Grafana

The Bulletproof Docker Guide. Stop flying blind. Prevent OOM Errors and Thermal Throttling in your AI Servers.

The Silent Killers of AI Servers

Crucial Prerequisite: The NVIDIA Container Toolkit

Phase 1: The Architecture

Step 2: Clean Directory Structure

Step 3: Prometheus Configuration

Step 4: The Docker Compose Magic

CRITICAL WARNING: The Disk Bloat Trap

Step 5: Visualizing in Grafana

Conclusion: Is Your GPU Bottlenecking You?

Monitoring FAQ

Ready to Launch with Unmatched Power?

How to Monitor NVIDIA GPUs (VRAM, Power, Temp) using Prometheus & Grafana

The Bulletproof Docker Guide. Stop flying blind. Prevent OOM Errors and Thermal Throttling in your AI Servers.

The Silent Killers of AI Servers

Crucial Prerequisite: The NVIDIA Container Toolkit

Phase 1: The Architecture

Step 2: Clean Directory Structure

Step 3: Prometheus Configuration

Step 4: The Docker Compose Magic

CRITICAL WARNING: The Disk Bloat Trap

Step 5: Visualizing in Grafana

Conclusion: Is Your GPU Bottlenecking You?

Monitoring FAQ

Ready to Launch with Unmatched Power?

Subscribe to Our Newsletter

Thank you for subscribing to

Christmas Mega Sale!