The "Serverless" Trap & The Bare Metal Reality
If you are building an AI Image Generation Agency (like a Midjourney competitor), you need an API that responds immediately. Many developers initially turn to "Serverless" GPU platforms. They seem cheap, but they hide a massive flaw: Cold Starts.
In a serverless environment, if your API hasn't received a request in a few minutes, the container goes to sleep. When a paying customer clicks "Generate Image," they are forced to wait 60 to 90 seconds just for the machine to wake up and load the massive SDXL or FLUX model. A 90-second delay will kill your user retention.
By deploying ComfyUI on a Dedicated Bare Metal Server, your models stay permanently loaded in the GPU's VRAM. The API acknowledges the request in milliseconds, and the actual image inference completes in just a few seconds (e.g., ~2 to 4 seconds for SDXL on an A100). Let's build a secure, production-grade architecture.
Step 1: Environment & Secure Headless Setup
We start by setting up ComfyUI on a fresh Ubuntu 24.04 server. Security Warning: Never run ComfyUI with --listen 0.0.0.0 in production without authentication. Hackers actively scan for open ComfyUI ports to hijack expensive GPUs for free image generation or crypto mining.
We will bind ComfyUI safely to localhost (127.0.0.1) so it is hidden from the public internet. We will use Nginx later to securely route traffic.
# 1. Clone the official repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# 2. Create a secure Python Virtual Environment
python3 -m venv venv
source venv/bin/activate
# 3. Install PyTorch with CUDA 12.1+ support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# 4. Install dependencies
pip install -r requirements.txt
# 5. Start ComfyUI Securely (Restricted to localhost)
python main.py --listen 127.0.0.1 --port 8188
Step 2: The Developer Secret - JSON API Payload
How do you convert a visual node-graph into code? ComfyUI has a hidden developer feature.
- Use SSH Port Forwarding (
ssh -L 8188:127.0.0.1:8188 user@server_ip) to safely access your local UI in your browser. - Click the Settings (Gear Icon) on the floating menu and check "Enable Dev mode Options".
- Build your workflow (e.g., SDXL text-to-image). Click the new "Save (API Format)" button.
This downloads a workflow_api.json file. This JSON maps every node. You simply find the node ID for the "CLIP Text Encode" and replace its text value via your code.
Step 3: Triggering the API via Python
Here is a robust Python snippet that modifies the JSON payload, submits it to the /prompt endpoint, and retrieves the prompt_id. In production, you will use this ID to connect to the WebSocket and stream the generation progress to your users.
import json
import urllib.request
import urllib.parse
import random
SERVER_ADDRESS = "127.0.0.1:8188"
CLIENT_ID = "my_agency_api_client"
def queue_prompt(prompt_workflow):
p = {"prompt": prompt_workflow, "client_id": CLIENT_ID}
data = json.dumps(p).encode('utf-8')
req = urllib.request.Request(f"http://{SERVER_ADDRESS}/prompt", data=data)
response = urllib.request.urlopen(req)
return json.loads(response.read())
# 1. Load your exported API JSON
with open("workflow_api.json", "r") as f:
workflow = json.load(f)
# 2. Dynamically inject the user's prompt (Assuming Node "6" is the text prompt)
workflow["6"]["inputs"]["text"] = "A futuristic cyberpunk city at night, photorealistic, 8k"
# 3. Randomize the seed for unique generations
workflow["3"]["inputs"]["seed"] = random.randint(1, 1000000000)
# 4. Fire the API
print("Sending workflow to ComfyUI...")
response = queue_prompt(workflow)
prompt_id = response['prompt_id']
print(f"Success! Prompt ID: {prompt_id}. Connect to ws://{SERVER_ADDRESS}/ws?clientId={CLIENT_ID} to track progress.")
Step 4: Nginx Reverse Proxy & WebSockets (The Shield)
To expose your API safely to your frontend, you must use Nginx. More importantly, ComfyUI relies heavily on WebSockets (/ws) to transmit the generation progress bar. Without properly configuring HTTP Upgrade headers in Nginx, your WebSocket connection will fail.
Create an Nginx configuration file (/etc/nginx/sites-available/comfyui):
server {
listen 80;
server_name api.yourdomain.com;
# Basic security - allow only POST for /prompt
# In production, add API Key validation logic here
location / {
proxy_pass http://127.0.0.1:8188;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# CRITICAL: WebSocket support for the /ws endpoint
location /ws {
proxy_pass http://127.0.0.1:8188/ws;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}
}
Step 5: Smart VRAM Batching (Don't Waste Your A100)
Many amateurs try to scale by running 4 separate ComfyUI instances on the same server. Do not do this. If you load a 14GB SDXL model 4 times, you waste 56GB of VRAM duplicating the exact same data!
An NVIDIA A100 with 80GB of VRAM is designed for massive parallel processing. Instead of duplicating instances, use Batching. Modify your JSON payload to set batch_size: 4 (or more) in your Empty Latent Image node. ComfyUI will process multiple images in a single pass using the *same* model loaded in memory, dramatically increasing your API's throughput (images per minute) while keeping VRAM usage efficient.
Alternatively, you can use that massive 80GB of VRAM to hold different heavy models simultaneously (e.g., FLUX.1 loaded in one workflow, SDXL in another) without crashing your server with Out of Memory (OOM) errors.
Build Your API Empire
Stop sharing resources and paying for idle time. To implement true batch processing and hold massive models like FLUX natively in memory, you need raw hardware power.
Explore our unthrottled Enterprise Bare Metal GPUs.