Running AI Models Locally with Remote Access: Ollama, Open WebUI, and Cloudflare Setup

Want to run powerful AI models locally and access them remotely through a user-friendly interface? This guide explores a seamless Docker Compose setup that combines Ollama, Open WebUI, and Cloudflare for a secure and accessible experience.

When working with advanced AI models, having a robust and accessible infrastructure is essential. This guide introduces a seamless setup combining three powerful tools: Ollama, Open WebUI, and Cloudflare Tunnel. Ollama serves as the AI model server, capable of leveraging NVIDIA GPUs for high-performance inference tasks. Open WebUI provides a user-friendly web interface to manage and interact with these models visually. To ensure secure and remote access, Cloudflare Tunnel creates a private and protected connection to the web UI, making your AI models accessible from anywhere in the world. Together, these tools form a powerful, efficient, and easily deployable stack for both local and remote AI development.

Prerequisites:

Supported NVIDIA GPU (for efficient model inference)
NVIDIA Container Toolkit (to manage GPU resources)
Docker Compose (to orchestrate containerized services)

Understanding the Services:

webui (ghcr.io/open-webui/open-webui:main): This acts as the web interface, allowing you to interact with your Ollama AI models visually.
ollama (Optional – ollama/ollama): This is the AI model server itself. It can leverage your NVIDIA GPU for faster inference tasks.
tunnel (cloudflare/cloudflared:latest): This service establishes a secure tunnel to your web UI via Cloudflare, enabling safe remote access.

Volumes and Environment Variables:

Two volumes, ollama and open-webui, are defined to store data persistently across container restarts. This ensures your models and configurations remain intact.
The crucial environment variable is OLLAMA_API_BASE_URL. Make sure it points to the correct internal network URL of the ollama service. If ollama runs directly on your Docker host, you can use host.docker.internal as the address.

Deployment and Access:

Deployment: Execute docker compose up -d to start all services in detached mode, running them in the background.
Local Access: If you just need to access the web UI locally, simply navigate to http://localhost:8080 in your web browser.
Remote Access: To access your AI models remotely, locate the Cloudflare Tunnel URL printed in the Docker logs. Use docker compose logs tunnel to retrieve this URL. Now, you can access your models from anywhere with an internet connection, provided you have the URL.

Benefits:

Simplified AI Model Management: Easily interact with your AI models through the user-friendly Ollama UI.
Remote Accessibility: Securely access your models from any location with a web browser thanks to Cloudflare’s tunneling capabilities.
GPU Acceleration (Optional): Leverage your NVIDIA GPU for faster model inference, speeding up tasks.

Getting Started

Install Docker


Copied!curl -sSL https://get.docker.com/ | sh


curl -sSL https://get.docker.com/ | sh

Writing a Docker Compose file


Copied!services:

  webui:
    image: ghcr.io/open-webui/open-webui:main
    expose:
     - 8080/tcp
    ports:
     - 8080:8080/tcp
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
     - ollama

  ollama:
    image: ollama/ollama
    expose:
     - 11434/tcp
    ports:
     - 11434:11434/tcp
    healthcheck:
      test: ollama --version || exit 1
    command: serve
    volumes:
      - ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['all']
              capabilities: [gpu]

  tunnel:
    image: cloudflare/cloudflared:latest
    restart: unless-stopped
    environment:
      - TUNNEL_URL=http://webui:8080
    command: tunnel --no-autoupdate
    depends_on:
      - webui

volumes:
  ollama:
  open-webui:


services:

  webui:
    image: ghcr.io/open-webui/open-webui:main
    expose:
     - 8080/tcp
    ports:
     - 8080:8080/tcp
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
     - ollama

  ollama:
    image: ollama/ollama
    expose:
     - 11434/tcp
    ports:
     - 11434:11434/tcp
    healthcheck:
      test: ollama --version || exit 1
    command: serve
    volumes:
      - ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['all']
              capabilities: [gpu]

  tunnel:
    image: cloudflare/cloudflared:latest
    restart: unless-stopped
    environment:
      - TUNNEL_URL=http://webui:8080
    command: tunnel --no-autoupdate
    depends_on:
      - webui

volumes:
  ollama:
  open-webui:

The Compose file defines the individual services that make up the entire application. Here, we have three services:

webui,
ollama ,
and tunnel.

The webui service acts as your user interface for interacting with Ollama AI models. It fetches data from the optional ollama service (the AI model server) running on the same network, and lets you manage and use your models visually. You can access the web interface at http://localhost:8080 if running locally. The ollama service itself (optional) handles running your models, and can leverage your NVIDIA GPU for faster computations. Finally, the tunnel service provides a secure way to access the web interface remotely through Cloudflare.

Bringing up the Stack


Copied!docker compose up -d

docker compose up -d

You will see the following services:


Copied!docker compose ps
NAME                  IMAGE                                COMMAND                  SERVICE   CREATED              STATUS                        PORTS
cloudflare-ollama-1   ollama/ollama                        "/bin/ollama serve"      ollama    About a minute ago   Up About a minute (healthy)   0.0.0.0:11434->11434/tcp
cloudflare-tunnel-1   cloudflare/cloudflared:latest        "cloudflared --no-au…"   tunnel    About a minute ago   Up About a minute
cloudflare-webui-1    ghcr.io/open-webui/open-webui:main   "bash start.sh"          webui     About a minute ago   Up About a minute             0.0.0.0:8080->8080/tcp

docker compose ps
NAME                  IMAGE                                COMMAND                  SERVICE   CREATED              STATUS                        PORTS
cloudflare-ollama-1   ollama/ollama                        "/bin/ollama serve"      ollama    About a minute ago   Up About a minute (healthy)   0.0.0.0:11434->11434/tcp
cloudflare-tunnel-1   cloudflare/cloudflared:latest        "cloudflared --no-au…"   tunnel    About a minute ago   Up About a minute
cloudflare-webui-1    ghcr.io/open-webui/open-webui:main   "bash start.sh"          webui     About a minute ago   Up About a minute             0.0.0.0:8080->8080/tcp

Conclusion

This setup empowers you to unlock the potential of your AI models both locally and remotely. With Ollama, Ollama UI, and Cloudflare working in tandem, you gain a powerful and accessible platform for exploring and utilizing AI technology.