The Data
Espionage Crisis
AI search engines like Perplexity have revolutionized research. But for
enterprise teams, using public AI search is a massive security risk. When you ask a cloud AI
to research your upcoming product features, debug proprietary code, or analyze competitors,
your search queries are logged. You are essentially providing a live feed of your
company's intellectual property to a third-party server.
The solution? Build a Self-Hosted Perplexity Alternative. By
combining Open WebUI (The Frontend) and SearXNG (The Meta-Search Engine), your
local LLM can browse the live internet, read the results, and synthesize answers—without a
single byte of your prompt leaving your server.
Cloud AI (Perplexity/ChatGPT)Queries logged, used for training.
High Risk
ServerMO AI ServerZero logs. Air-gapped prompt processing.
100% PRIVATE
Step 1: The Engineering Truth – VRAM & Architecture
Many tech blogs claim you can run a real-time AI search engine on a
basic laptop or a cheap VPS. This is a myth.
RAG (Retrieval-Augmented Generation) is heavy. When SearXNG pulls 5 web
pages, your LLM must read thousands of tokens of raw HTML/Text instantly to generate an
answer. If you lack GPU VRAM, this process will take 30+ seconds or crash with an OOM (Out
of Memory) error. You need a GPU Dedicated Server (like an NVIDIA RTX or A100/H100)
for instant, human-like response times.
The Architecture Stack:
- Ollama (The Brain): Runs the LLM (e.g., Llama 3 or DeepSeek) locally on your GPU.
- SearXNG (The Eyes): A privacy-respecting engine that fetches live Google/Bing
results anonymously.
- Open WebUI (The Face): The ChatGPT-like interface that glues them together.
Prerequisite: GPU Drivers
Before proceeding, ensure your ServerMO GPU server has the NVIDIA drivers and the NVIDIA
Container Toolkit installed so Docker can access the GPU.
(Read our NVIDIA Toolkit
Guide here).
Step 2: The Critical SearXNG Configuration
Most tutorials fail because they skip this step. Open WebUI communicates
with SearXNG via JSON. If SearXNG is not explicitly configured to output JSON, your AI
searches will return empty results.
# Create directories for our stack
mkdir -p ~/private-ai/searxng
cd ~/private-ai/searxng
# Create the SearXNG settings file
nano settings.yml
Paste the following configuration. Notice we are enabling
json formats and disabling the limiter so your local AI doesn't
get rate-limited by its own search engine.
use_default_settings: true
search:
formats:
- html
- json # CRITICAL: Open WebUI needs this
server:
secret_key: "generate_a_random_secret_string"
limiter: false # Disable rate limiting for internal API calls
image_proxy: true
bind_address: "0.0.0.0"
port: 8080
Pro-Tip: Deploy Instantly via GitHub
Want to skip the copy-pasting? You can download the complete, pre-configured
docker-compose.yml and settings.yml files directly from our
Official GitHub Gist.
View the Code on GitHub
Or run these terminal commands to download
them directly to your server:
# Go to the main project folder
mkdir -p ~/private-ai && cd ~/private-ai
# Download docker-compose.yml
wget https://gist.githubusercontent.com/jaksontate/a1f218da6f1624b83805dc6c2406c2af/raw/docker-compose.yml
# Create searxng folder and download settings.yml into it
mkdir -p searxng
wget https://gist.githubusercontent.com/jaksontate/a1f218da6f1624b83805dc6c2406c2af/raw/searxng-settings.yml -O ./searxng/settings.yml
Step 3: Secure Docker Deployment
Security is paramount. We will deploy all three services using Docker
Compose, placing them on an isolated bridge network. SearXNG will NOT be exposed to the
public internet. Only Open WebUI will be able to talk to it internally.
cd ~/private-ai
nano docker-compose.yml
Paste this production-ready configuration:
version: '3.8'
networks:
ai-net:
driver: bridge
services:
# 1. The Brain
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: always
networks:
- ai-net
volumes:
- ./ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# 2. The Eyes (Isolated)
searxng:
image: searxng/searxng:latest
container_name: searxng
restart: always
networks:
- ai-net
volumes:
- ./searxng:/etc/searxng:rw
# Notice: NO ports are mapped to the host! It remains hidden.
# 3. The Face
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
networks:
- ai-net
ports:
- "3000:8080"
volumes:
- ./open-webui:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
# Web Search Configs
- ENABLE_WEB_SEARCH=True
- SEARXNG_QUERY_URL=http://searxng:8080/search?q=
depends_on:
- ollama
- searxng
Docker Networking Note: Why two 8080 ports?
You might notice both Open WebUI and SearXNG use port 8080 internally. Will they conflict? No. Because they are on an isolated Docker bridge network (ai-net), Docker assigns them unique internal IPs. SearXNG listens on its own 8080 invisibly, while Open WebUI maps its 8080 to your server's public 3000 port.
Start the cluster:
Step 4: UI Integration & Testing
Your private AI search engine is now running. Let's configure the final
pieces.
- Open your browser and navigate to
http://your-server-ip:3000. - Create your admin account.
- Download an LLM (VRAM Check): Go to Settings > Admin Settings > Models.
Type your desired model and click download.
llama3 (8B) - Requires minimum 8GB VRAM (RTX 4060 / A10G)
qwen2.5:14b - Requires minimum 16GB VRAM (RTX 4080 / A4000)
mixtral (8x7B) - Requires 48GB+ VRAM (A6000 / A100)
Warning: Pulling a model larger than your GPU's VRAM will cause an Out of Memory
(OOM) crash or force it to run on the CPU (extremely slow). - Verify Web Search: Go to Settings > Admin Settings > Web Search. Ensure
the Engine is set to
searxng and the Query URL is
http://searxng:8080/search?q=<query>.
The Ultimate Test:
Open a new chat. Click the "+" or Web Search toggle icon. Ask a question about an
event that happened yesterday. You will see the UI fetching sources via SearXNG,
processing them through your local GPU, and providing an instant, fully-cited response.
Step 5: Enterprise Security – Nginx & SSL
You can currently access your AI via http://your-server-ip:3000. However, using raw HTTP for an Enterprise AI tool is a critical security risk. Anyone on your network can intercept your prompts and corporate secrets via Man-in-the-Middle (MitM) attacks.
Lock Down Your AI (Mandatory for Production)
You must place Open WebUI behind an Nginx Reverse Proxy and secure it with a Let's Encrypt SSL certificate. This ensures end-to-end encryption for your queries.
Simply map a domain (e.g., ai.yourcompany.com) to your server IP, and proxy it to port 3000.
Read our 5-Minute SSL Guide Here
Troubleshooting: Fixing Common Errors
Server environments can vary. If your setup isn't working perfectly on the first try, don't
panic. Here is how to fix the 3 most common configuration issues:
- Error: "Web Search returns empty/blank results."
Fix: Docker might not have mounted your settings.yml file properly,
meaning SearXNG is still outputting HTML instead of JSON.
Run: docker compose down, ensure the file is named exactly
settings.yml inside the searxng folder, and run
docker compose up -d again.
- Error: "Ollama connection refused / LLM not
responding."
Fix: Open WebUI cannot see Ollama. Go to Settings > Admin Settings >
Connections in Open WebUI. Ensure the Ollama Base URL is exactly
http://ollama:11434 (not localhost or 127.0.0.1, because they are inside a
Docker bridge network).
- Error: "Docker compose fails with GPU driver error."
Fix: Your server doesn't have the NVIDIA Container Toolkit installed. Run
nvidia-smi to check your drivers, and follow our NVIDIA Toolkit Guide to fix
the Docker-to-GPU bridge.
Why Run Private AI on ServerMO GPU Servers?
The bottleneck in any AI Search Engine is not the web search—it is the Token Generation
speed. Processing live web HTML requires massive context windows and high VRAM
throughput.
Local Workstation /
CPUMemory Constrained
- 30s+ response
latency
- OOM Crashes on large
articles
- Cannot handle
concurrent users
ServerMO GPU ServerNVIDIA A100 / H100 Available
- Instant token
generation
- Massive Context
Windows (RAG)
- Multi-user Enterprise
Scale