, , ,

Running AI Models Locally with Remote Access: Ollama, Open WebUI, and Cloudflare Setup

ajeetraina Avatar

·

, , ,

·

Want to run powerful AI models locally and access them remotely through a user-friendly interface? This guide explores a seamless Docker Compose setup that combines Ollama, Open WebUI, and Cloudflare for a secure and accessible experience.

When working with advanced AI models, having a robust and accessible infrastructure is essential. This guide introduces a seamless setup combining three powerful tools: Ollama, Open WebUI, and Cloudflare Tunnel. Ollama serves as the AI model server, capable of leveraging NVIDIA GPUs for high-performance inference tasks. Open WebUI provides a user-friendly web interface to manage and interact with these models visually. To ensure secure and remote access, Cloudflare Tunnel creates a private and protected connection to the web UI, making your AI models accessible from anywhere in the world. Together, these tools form a powerful, efficient, and easily deployable stack for both local and remote AI development.

Prerequisites:

  • Supported NVIDIA GPU (for efficient model inference)
  • NVIDIA Container Toolkit (to manage GPU resources)
  • Docker Compose (to orchestrate containerized services)

Understanding the Services:

  • webui (ghcr.io/open-webui/open-webui:main): This acts as the web interface, allowing you to interact with your Ollama AI models visually.
  • ollama (Optional – ollama/ollama): This is the AI model server itself. It can leverage your NVIDIA GPU for faster inference tasks.
  • tunnel (cloudflare/cloudflared:latest): This service establishes a secure tunnel to your web UI via Cloudflare, enabling safe remote access.

Volumes and Environment Variables:

  • Two volumes, ollama and open-webui, are defined to store data persistently across container restarts. This ensures your models and configurations remain intact.
  • The crucial environment variable is OLLAMA_API_BASE_URL. Make sure it points to the correct internal network URL of the ollama service. If ollama runs directly on your Docker host, you can use host.docker.internal as the address.

Deployment and Access:

  • Deployment: Execute docker compose up -d to start all services in detached mode, running them in the background.
  • Local Access: If you just need to access the web UI locally, simply navigate to http://localhost:8080 in your web browser.
  • Remote Access: To access your AI models remotely, locate the Cloudflare Tunnel URL printed in the Docker logs. Use docker compose logs tunnel to retrieve this URL. Now, you can access your models from anywhere with an internet connection, provided you have the URL.

Benefits:

  • Simplified AI Model Management: Easily interact with your AI models through the user-friendly Ollama UI.
  • Remote Accessibility: Securely access your models from any location with a web browser thanks to Cloudflare’s tunneling capabilities.
  • GPU Acceleration (Optional): Leverage your NVIDIA GPU for faster model inference, speeding up tasks.

Getting Started

  • Install Docker
Copied!
curl -sSL https://get.docker.com/ | sh

Writing a Docker Compose file

Copied!
services: webui: image: ghcr.io/open-webui/open-webui:main expose: - 8080/tcp ports: - 8080:8080/tcp environment: - OLLAMA_BASE_URL=http://host.docker.internal:11434 volumes: - open-webui:/app/backend/data depends_on: - ollama ollama: image: ollama/ollama expose: - 11434/tcp ports: - 11434:11434/tcp healthcheck: test: ollama --version || exit 1 command: serve volumes: - ollama:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia device_ids: ['all'] capabilities: [gpu] tunnel: image: cloudflare/cloudflared:latest restart: unless-stopped environment: - TUNNEL_URL=http://webui:8080 command: tunnel --no-autoupdate depends_on: - webui volumes: ollama: open-webui:

The Compose file defines the individual services that make up the entire application. Here, we have three services:

  • webui,
  • ollama ,
  • and tunnel.

The webui service acts as your user interface for interacting with Ollama AI models. It fetches data from the optional ollama service (the AI model server) running on the same network, and lets you manage and use your models visually. You can access the web interface at http://localhost:8080 if running locally. The ollama service itself (optional) handles running your models, and can leverage your NVIDIA GPU for faster computations. Finally, the tunnel service provides a secure way to access the web interface remotely through Cloudflare.

Bringing up the Stack

Copied!
docker compose up -d

You will see the following services:

Copied!
docker compose ps NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS cloudflare-ollama-1 ollama/ollama "/bin/ollama serve" ollama About a minute ago Up About a minute (healthy) 0.0.0.0:11434->11434/tcp cloudflare-tunnel-1 cloudflare/cloudflared:latest "cloudflared --no-au…" tunnel About a minute ago Up About a minute cloudflare-webui-1 ghcr.io/open-webui/open-webui:main "bash start.sh" webui About a minute ago Up About a minute 0.0.0.0:8080->8080/tcp
Image3

Conclusion

This setup empowers you to unlock the potential of your AI models both locally and remotely. With Ollama, Ollama UI, and Cloudflare working in tandem, you gain a powerful and accessible platform for exploring and utilizing AI technology.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *