Run a Private Perplexity Alternative: Open WebUI + SearXNG

The Data Espionage Crisis

AI search engines like Perplexity have revolutionized research. But for enterprise teams, using public AI search is a massive security risk. When you ask a cloud AI to research your upcoming product features, debug proprietary code, or analyze competitors, your search queries are logged. You are essentially providing a live feed of your company's intellectual property to a third-party server.

The solution? Build a Self-Hosted Perplexity Alternative. By combining Open WebUI (The Frontend) and SearXNG (The Meta-Search Engine), your local LLM can browse the live internet, read the results, and synthesize answers—without a single byte of your prompt leaving your server.

The Privacy Audit

Self-Hosted Advantage

Cloud AI (Perplexity/ChatGPT)Queries logged, used for training.

High Risk

ServerMO AI ServerZero logs. Air-gapped prompt processing.

100% PRIVATE

Step 1: The Engineering Truth – VRAM & Architecture

Many tech blogs claim you can run a real-time AI search engine on a basic laptop or a cheap VPS. This is a myth.

RAG (Retrieval-Augmented Generation) is heavy. When SearXNG pulls 5 web pages, your LLM must read thousands of tokens of raw HTML/Text instantly to generate an answer. If you lack GPU VRAM, this process will take 30+ seconds or crash with an OOM (Out of Memory) error. You need a GPU Dedicated Server (like an NVIDIA RTX or A100/H100) for instant, human-like response times.

The Architecture Stack:

Ollama (The Brain): Runs the LLM (e.g., Llama 3 or DeepSeek) locally on your GPU.
SearXNG (The Eyes): A privacy-respecting engine that fetches live Google/Bing results anonymously.
Open WebUI (The Face): The ChatGPT-like interface that glues them together.

Prerequisite: GPU Drivers

Before proceeding, ensure your ServerMO GPU server has the NVIDIA drivers and the NVIDIA Container Toolkit installed so Docker can access the GPU.
(Read our NVIDIA Toolkit Guide here).

Step 2: The Critical SearXNG Configuration

Most tutorials fail because they skip this step. Open WebUI communicates with SearXNG via JSON. If SearXNG is not explicitly configured to output JSON, your AI searches will return empty results.

# Create directories for our stack
mkdir -p ~/private-ai/searxng
cd ~/private-ai/searxng

# Create the SearXNG settings file
nano settings.yml

Paste the following configuration. Notice we are enabling json formats and disabling the limiter so your local AI doesn't get rate-limited by its own search engine.

use_default_settings: true
search:
  formats:
    - html
    - json  # CRITICAL: Open WebUI needs this
server:
  secret_key: "generate_a_random_secret_string"
  limiter: false # Disable rate limiting for internal API calls
  image_proxy: true
  bind_address: "0.0.0.0"
  port: 8080

Pro-Tip: Deploy Instantly via GitHub

Want to skip the copy-pasting? You can download the complete, pre-configured docker-compose.yml and settings.yml files directly from our Official GitHub Gist.

View the Code on GitHub

Or run these terminal commands to download them directly to your server:

# Go to the main project folder
mkdir -p ~/private-ai && cd ~/private-ai

# Download docker-compose.yml
wget https://gist.githubusercontent.com/jaksontate/a1f218da6f1624b83805dc6c2406c2af/raw/docker-compose.yml

# Create searxng folder and download settings.yml into it
mkdir -p searxng
wget https://gist.githubusercontent.com/jaksontate/a1f218da6f1624b83805dc6c2406c2af/raw/searxng-settings.yml -O ./searxng/settings.yml

Step 3: Secure Docker Deployment

Security is paramount. We will deploy all three services using Docker Compose, placing them on an isolated bridge network. SearXNG will NOT be exposed to the public internet. Only Open WebUI will be able to talk to it internally.

cd ~/private-ai
nano docker-compose.yml

Paste this production-ready configuration:

version: '3.8'

networks:
  ai-net:
    driver: bridge

services:
  # 1. The Brain
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: always
    networks:
      - ai-net
    volumes:
      - ./ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  # 2. The Eyes (Isolated)
  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    restart: always
    networks:
      - ai-net
    volumes:
      - ./searxng:/etc/searxng:rw
    # Notice: NO ports are mapped to the host! It remains hidden.

  # 3. The Face
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    networks:
      - ai-net
    ports:
      - "3000:8080"
    volumes:
      - ./open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      # Web Search Configs
      - ENABLE_WEB_SEARCH=True
      - SEARXNG_QUERY_URL=http://searxng:8080/search?q=
    depends_on:
      - ollama
      - searxng

Docker Networking Note: Why two 8080 ports?

You might notice both Open WebUI and SearXNG use port 8080 internally. Will they conflict? No. Because they are on an isolated Docker bridge network (ai-net), Docker assigns them unique internal IPs. SearXNG listens on its own 8080 invisibly, while Open WebUI maps its 8080 to your server's public 3000 port.

Start the cluster:

docker compose up -d

Step 4: UI Integration & Testing

Your private AI search engine is now running. Let's configure the final pieces.

Open your browser and navigate to http://your-server-ip:3000.
Create your admin account.
Download an LLM (VRAM Check): Go to Settings > Admin Settings > Models. Type your desired model and click download.
- llama3 (8B) - Requires minimum 8GB VRAM (RTX 4060 / A10G)
- qwen2.5:14b - Requires minimum 16GB VRAM (RTX 4080 / A4000)
- mixtral (8x7B) - Requires 48GB+ VRAM (A6000 / A100)
Warning: Pulling a model larger than your GPU's VRAM will cause an Out of Memory (OOM) crash or force it to run on the CPU (extremely slow).
Verify Web Search: Go to Settings > Admin Settings > Web Search. Ensure the Engine is set to searxng and the Query URL is http://searxng:8080/search?q=<query>.

The Ultimate Test:
Open a new chat. Click the "+" or Web Search toggle icon. Ask a question about an event that happened yesterday. You will see the UI fetching sources via SearXNG, processing them through your local GPU, and providing an instant, fully-cited response.

Step 5: Enterprise Security – Nginx & SSL

You can currently access your AI via http://your-server-ip:3000. However, using raw HTTP for an Enterprise AI tool is a critical security risk. Anyone on your network can intercept your prompts and corporate secrets via Man-in-the-Middle (MitM) attacks.

Lock Down Your AI (Mandatory for Production)

You must place Open WebUI behind an Nginx Reverse Proxy and secure it with a Let's Encrypt SSL certificate. This ensures end-to-end encryption for your queries.

Simply map a domain (e.g., ai.yourcompany.com) to your server IP, and proxy it to port 3000.

Read our 5-Minute SSL Guide Here

Troubleshooting: Fixing Common Errors

Server environments can vary. If your setup isn't working perfectly on the first try, don't panic. Here is how to fix the 3 most common configuration issues:

Error: "Web Search returns empty/blank results."
Fix: Docker might not have mounted your settings.yml file properly, meaning SearXNG is still outputting HTML instead of JSON.
Run: docker compose down, ensure the file is named exactly settings.yml inside the searxng folder, and run docker compose up -d again.
Error: "Ollama connection refused / LLM not responding."
Fix: Open WebUI cannot see Ollama. Go to Settings > Admin Settings > Connections in Open WebUI. Ensure the Ollama Base URL is exactly http://ollama:11434 (not localhost or 127.0.0.1, because they are inside a Docker bridge network).
Error: "Docker compose fails with GPU driver error."
Fix: Your server doesn't have the NVIDIA Container Toolkit installed. Run nvidia-smi to check your drivers, and follow our NVIDIA Toolkit Guide to fix the Docker-to-GPU bridge.

Why Run Private AI on ServerMO GPU Servers?

The bottleneck in any AI Search Engine is not the web search—it is the Token Generation speed. Processing live web HTML requires massive context windows and high VRAM throughput.

Local Workstation / CPU

Memory Constrained

30s+ response latency
OOM Crashes on large articles
Cannot handle concurrent users

ServerMO GPU Server

NVIDIA A100 / H100 Available

Instant token generation
Massive Context Windows (RAG)
Multi-user Enterprise Scale

NVIDIA A100 NVIDIA H100

Don't compromise on speed or privacy.

Deploy your Sovereign AI on hardware built for neural networks. Protect your IP while getting answers in milliseconds.

Power your AI Agents today.

Deploy a Dedicated GPU Server

Take Your AI to the Next Level:

Having a private AI search engine is great for research. But what if you want AI to automatically reply to your emails or manage your CRM?

Read our next guide: Kill Your Zapier Bill: Build Private AI Agents with n8n & Ollama

Private AI Search FAQ

Why can't I just use the Google Search API instead of SearXNG?

Using commercial APIs like Google or Bing re-introduces the "API Tax" (pay-per-search) and logging. SearXNG acts as a privacy-respecting proxy. It scrapes multiple engines simultaneously without API keys and prevents those engines from tracking your original server IP.

Why is my SearXNG returning blank results in Open WebUI?

This almost always happens because the JSON format is not enabled in SearXNG. Ensure your settings.yml file explicitly includes - json under search.formats, and restart the Docker container.

Can multiple employees use this setup at once?

Yes. Open WebUI has a built-in user management system. However, if 10 users trigger web searches simultaneously, your Ollama backend will need to process multiple LLM requests at once. This requires significant VRAM. A high-end ServerMO GPU Server is highly recommended for multi-user enterprise deployments.

Run Your Own Private Perplexity: Setup Open WebUI + SearXNG on GPU

Stop feeding corporate secrets to cloud AI. Build a 100% sovereign, real-time AI search engine on Dedicated Bare Metal.

On This Page

The Data Espionage Crisis

The Privacy Audit

Step 1: The Engineering Truth – VRAM & Architecture

Prerequisite: GPU Drivers

Step 2: The Critical SearXNG Configuration

Pro-Tip: Deploy Instantly via GitHub

Step 3: Secure Docker Deployment

Docker Networking Note: Why two 8080 ports?

Step 4: UI Integration & Testing

Step 5: Enterprise Security – Nginx & SSL

Lock Down Your AI (Mandatory for Production)

Troubleshooting: Fixing Common Errors

Why Run Private AI on ServerMO GPU Servers?

Take Your AI to the Next Level:

Private AI Search FAQ

Ready to Launch with Unmatched Power?

Run Your Own Private Perplexity: Setup Open WebUI + SearXNG on GPU

Stop feeding corporate secrets to cloud AI. Build a 100% sovereign, real-time AI search engine on Dedicated Bare Metal.

On This Page

The Data Espionage Crisis

The Privacy Audit

Step 1: The Engineering Truth – VRAM & Architecture

Prerequisite: GPU Drivers

Step 2: The Critical SearXNG Configuration

Pro-Tip: Deploy Instantly via GitHub

Step 3: Secure Docker Deployment

Docker Networking Note: Why two 8080 ports?

Step 4: UI Integration & Testing

Step 5: Enterprise Security – Nginx & SSL

Lock Down Your AI (Mandatory for Production)

Troubleshooting: Fixing Common Errors

Why Run Private AI on ServerMO GPU Servers?

Take Your AI to the Next Level:

Private AI Search FAQ

Ready to Launch with Unmatched Power?

Subscribe to Our Newsletter

Thank you for subscribing to

Christmas Mega Sale!