Deploy ComfyUI API on NVIDIA A100: Bare Metal Guide

Hot Topics

Software

Information

Architectural Blueprint

The "Serverless" Trap & The Bare Metal Reality
Step 1: Environment & Secure Headless Setup
Step 2: The Developer Secret - JSON API Payload
Step 3: Triggering the API via Python
Step 4: Nginx Reverse Proxy & WebSockets (The Production Shield)
Step 5: Smart VRAM Batching (Don't Waste Your A100)

The "Serverless" Trap & The Bare Metal Reality

If you are building an AI Image Generation Agency (like a Midjourney competitor), you need an API that responds immediately. Many developers initially turn to "Serverless" GPU platforms. They seem cheap, but they hide a massive flaw: Cold Starts.

In a serverless environment, if your API hasn't received a request in a few minutes, the container goes to sleep. When a paying customer clicks "Generate Image," they are forced to wait 60 to 90 seconds just for the machine to wake up and load the massive SDXL or FLUX model. A 90-second delay will kill your user retention.

By deploying ComfyUI on a Dedicated Bare Metal Server, your models stay permanently loaded in the GPU's VRAM. The API acknowledges the request in milliseconds, and the actual image inference completes in just a few seconds (e.g., ~2 to 4 seconds for SDXL on an A100). Let's build a secure, production-grade architecture.

Step 1: Environment & Secure Headless Setup

We start by setting up ComfyUI on a fresh Ubuntu 24.04 server. Security Warning: Never run ComfyUI with --listen 0.0.0.0 in production without authentication. Hackers actively scan for open ComfyUI ports to hijack expensive GPUs for free image generation or crypto mining.

We will bind ComfyUI safely to localhost (127.0.0.1) so it is hidden from the public internet. We will use Nginx later to securely route traffic.

# 1. Clone the official repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# 2. Create a secure Python Virtual Environment
python3 -m venv venv
source venv/bin/activate

# 3. Install PyTorch with CUDA 12.1+ support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 4. Install dependencies
pip install -r requirements.txt

# 5. Start ComfyUI Securely (Restricted to localhost)
python main.py --listen 127.0.0.1 --port 8188

Step 2: The Developer Secret - JSON API Payload

How do you convert a visual node-graph into code? ComfyUI has a hidden developer feature.

Use SSH Port Forwarding (ssh -L 8188:127.0.0.1:8188 user@server_ip) to safely access your local UI in your browser.
Click the Settings (Gear Icon) on the floating menu and check "Enable Dev mode Options".
Build your workflow (e.g., SDXL text-to-image). Click the new "Save (API Format)" button.

This downloads a workflow_api.json file. This JSON maps every node. You simply find the node ID for the "CLIP Text Encode" and replace its text value via your code.

Step 3: Triggering the API via Python

Here is a robust Python snippet that modifies the JSON payload, submits it to the /prompt endpoint, and retrieves the prompt_id. In production, you will use this ID to connect to the WebSocket and stream the generation progress to your users.

import json
import urllib.request
import urllib.parse
import random

SERVER_ADDRESS = "127.0.0.1:8188"
CLIENT_ID = "my_agency_api_client"

def queue_prompt(prompt_workflow):
    p = {"prompt": prompt_workflow, "client_id": CLIENT_ID}
    data = json.dumps(p).encode('utf-8')
    req =  urllib.request.Request(f"http://{SERVER_ADDRESS}/prompt", data=data)
    response = urllib.request.urlopen(req)
    return json.loads(response.read())

# 1. Load your exported API JSON
with open("workflow_api.json", "r") as f:
    workflow = json.load(f)

# 2. Dynamically inject the user's prompt (Assuming Node "6" is the text prompt)
workflow["6"]["inputs"]["text"] = "A futuristic cyberpunk city at night, photorealistic, 8k"

# 3. Randomize the seed for unique generations
workflow["3"]["inputs"]["seed"] = random.randint(1, 1000000000)

# 4. Fire the API
print("Sending workflow to ComfyUI...")
response = queue_prompt(workflow)
prompt_id = response['prompt_id']

print(f"Success! Prompt ID: {prompt_id}. Connect to ws://{SERVER_ADDRESS}/ws?clientId={CLIENT_ID} to track progress.")

Step 4: Nginx Reverse Proxy & WebSockets (The Shield)

To expose your API safely to your frontend, you must use Nginx. More importantly, ComfyUI relies heavily on WebSockets (/ws) to transmit the generation progress bar. Without properly configuring HTTP Upgrade headers in Nginx, your WebSocket connection will fail.

Create an Nginx configuration file (/etc/nginx/sites-available/comfyui):

server {
    listen 80;
    server_name api.yourdomain.com;

    # Basic security - allow only POST for /prompt
    # In production, add API Key validation logic here

    location / {
        proxy_pass http://127.0.0.1:8188;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # CRITICAL: WebSocket support for the /ws endpoint
    location /ws {
        proxy_pass http://127.0.0.1:8188/ws;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
    }
}

Step 5: Smart VRAM Batching (Don't Waste Your A100)

Many amateurs try to scale by running 4 separate ComfyUI instances on the same server. Do not do this. If you load a 14GB SDXL model 4 times, you waste 56GB of VRAM duplicating the exact same data!

An NVIDIA A100 with 80GB of VRAM is designed for massive parallel processing. Instead of duplicating instances, use Batching. Modify your JSON payload to set batch_size: 4 (or more) in your Empty Latent Image node. ComfyUI will process multiple images in a single pass using the *same* model loaded in memory, dramatically increasing your API's throughput (images per minute) while keeping VRAM usage efficient.

Alternatively, you can use that massive 80GB of VRAM to hold different heavy models simultaneously (e.g., FLUX.1 loaded in one workflow, SDXL in another) without crashing your server with Out of Memory (OOM) errors.

Build Your API Empire

Stop sharing resources and paying for idle time. To implement true batch processing and hold massive models like FLUX natively in memory, you need raw hardware power.

Explore our unthrottled Enterprise Bare Metal GPUs.

ComfyUI API & GPU Hosting FAQ

Why shouldn't I just use Serverless platforms like RunPod or AWS for my API?

Serverless platforms scale to zero to save costs, which means your GPU container goes to sleep when idle. When a user requests an image, the server must wake up and load massive models (like SDXL or FLUX) into VRAM from scratch. This "Cold Start" causes 60 to 90-second delays, ruining the user experience. A Bare Metal Dedicated Server keeps your models permanently in VRAM for instant inference.

How much VRAM do I actually need for a ComfyUI API?

For basic local testing with SD 1.5, 8GB of VRAM is sufficient. However, for a production API using SDXL or FLUX.1, 24GB is the bare minimum. If you want to use advanced concurrent batching, or keep multiple heavy models and ControlNets loaded simultaneously without crashing, an 80GB enterprise GPU (like the NVIDIA A100 or H100) is highly recommended.

Can I use Automatic1111 instead of ComfyUI for an API?

While Automatic1111 has an API, it is primarily built as a monolithic GUI. ComfyUI is fundamentally node-based and relies on a clean, deterministic graph backend. This makes ComfyUI significantly faster, more memory-efficient, and much easier to integrate into complex, multi-step production workflows via its native JSON payload format.

Build Your AI Agency: Deploying a Secure ComfyUI API on NVIDIA Bare Metal

Stop losing users to serverless cold starts. Discover how to host, secure, and batch production-ready image generation workflows on an 80GB A100 GPU.

Architectural Blueprint

The "Serverless" Trap & The Bare Metal Reality

Step 1: Environment & Secure Headless Setup

Step 2: The Developer Secret - JSON API Payload

Step 3: Triggering the API via Python

Step 4: Nginx Reverse Proxy & WebSockets (The Shield)

Step 5: Smart VRAM Batching (Don't Waste Your A100)

Build Your API Empire

ComfyUI API & GPU Hosting FAQ

Ready to Launch with Unmatched Power?

Build Your AI Agency: Deploying a Secure ComfyUI API on NVIDIA Bare Metal

Stop losing users to serverless cold starts. Discover how to host, secure, and batch production-ready image generation workflows on an 80GB A100 GPU.

Architectural Blueprint

The "Serverless" Trap & The Bare Metal Reality

Step 1: Environment & Secure Headless Setup

Step 2: The Developer Secret - JSON API Payload

Step 3: Triggering the API via Python

Step 4: Nginx Reverse Proxy & WebSockets (The Shield)

Step 5: Smart VRAM Batching (Don't Waste Your A100)

Build Your API Empire

ComfyUI API & GPU Hosting FAQ

Ready to Launch with Unmatched Power?

Subscribe to Our Newsletter

Thank you for subscribing to

Christmas Mega Sale!