Run Local LLMs: Ollama + Open WebUI Installation Guide

Take full control of your AI. No subscriptions, no data leaks, no cloud required.

Introduction

Running large language models (LLMs) locally has never been more accessible. With Ollama handling model management and Open WebUI providing a polished browser-based interface, you can have your own private ChatGPT-style setup running on your own hardware in under 30 minutes.

This guide walks you through everything: installation, model management, configuration tips, and answers to the most common questions beginners and power users ask.

What You Will Need

Before getting started, make sure your system meets the basics:

OS: Windows 10/11, macOS (Apple Silicon or Intel), or Linux
RAM: Minimum 8 GB (16 GB or more recommended for larger models)
Storage: At least 10 to 50 GB free depending on which models you want to run
GPU (optional but recommended): NVIDIA (CUDA), AMD (ROCm), or Apple Silicon (Metal). CPU-only works but is slower.
Docker (required for Open WebUI): Get Docker

Part 1: Installing Ollama

Ollama is the engine that downloads, manages, and serves your local LLMs via a simple API.

Windows and macOS

Go to https://ollama.com/download
Download the installer for your operating system
Run the installer and follow the on-screen instructions
Once installed, Ollama runs silently in the background

Linux

Open a terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

Ollama is now installed and running as a system service.

Verify the Installation

Open a terminal (or PowerShell on Windows) and run:

ollama --version

You should see a version number printed. Ollama is ready.

Part 2: Downloading Your First Model

Ollama uses a simple pull command to download models from its library.

Popular Models to Start With

Model	Size	Best For
`llama3.2`	~2 to 4 GB	General use, fast responses
`mistral`	~4 GB	Coding, reasoning
`gemma3:4b`	~3 GB	Lightweight, good quality
`qwen2.5:7b`	~5 GB	Multilingual, coding
`deepseek-r1:7b`	~5 GB	Reasoning, math
`phi4`	~9 GB	Advanced reasoning, small footprint

Pull a Model

ollama pull llama3.2

This downloads the model to your local machine. Depending on your internet speed, this may take a few minutes.

Run a Model in the Terminal

Test it quickly before setting up the UI by running:

ollama run llama3.2

Type your prompt and press Enter. Type /bye to exit.

List Downloaded Models

ollama list

Remove a Model

ollama rm modelname

Part 3: Installing Open WebUI

Open WebUI gives you a full chat interface, similar to ChatGPT, that connects directly to your local Ollama instance.

Option A: Docker (Recommended)

Make sure Docker Desktop is running, then execute:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

On Windows (PowerShell), use backticks instead of backslashes:

docker run -d `
  -p 3000:8080 `
  --add-host=host.docker.internal:host-gateway `
  -v open-webui:/app/backend/data `
  --name open-webui `
  --restart always `
  ghcr.io/open-webui/open-webui:main

Option B: If You Have an NVIDIA GPU

Replace the image tag at the end of the command with the CUDA version:

ghcr.io/open-webui/open-webui:cuda

Access the Interface

Once the container is running, open your browser and go to:

http://localhost:3000

The first time you visit, you will be prompted to create an admin account. This account is local only. No data is sent anywhere.

Part 4: Connecting Open WebUI to Ollama

Open WebUI should auto-detect Ollama if both are running on the same machine.

Manual Configuration

If models do not appear automatically:

Log into Open WebUI
Click your profile icon and go to Settings
Navigate to Connections
Set the Ollama API URL to:
- Linux/macOS: http://localhost:11434
- Docker on Windows/Mac: http://host.docker.internal:11434
Click Save

Your downloaded models will now appear in the model selector dropdown at the top of the chat window.

Part 5: Using Open WebUI

Starting a Chat

Open http://localhost:3000 in your browser
Select a model from the dropdown at the top
Type your message and hit Enter

Key Features Worth Knowing

System Prompts: Give the model a custom persona or a set of rules before the conversation starts
Multiple Models: Switch between models mid-session or compare two side by side
Chat History: All conversations are saved locally in the Docker volume
File Upload: Upload documents for the model to reference (PDF, TXT, and more)
RAG (Retrieval-Augmented Generation): Index your own documents and query them through chat
Image Generation: Connect to a local Stable Diffusion instance for image creation
Web Search: Enable real-time search grounding for more accurate and up-to-date answers

Part 6: Tips and Optimizations

Speed Up Inference

Use a GPU whenever possible. Even a mid-range GPU dramatically improves response speed.
Reduce context length in model settings if you do not need long conversations.
Use smaller quantized models (for example q4_K_M) for faster performance with minor quality trade-offs.

Run Ollama as a Network Server

To access Ollama from other devices on your local network, set the host before starting:

OLLAMA_HOST=0.0.0.0 ollama serve

Then update Open WebUI’s Ollama URL to your machine’s local IP address, for example http://192.168.1.100:11434.

Keep Everything Updated

Update Ollama:

Windows/macOS: Re-download and reinstall from ollama.com
Linux: Re-run the install script

Update Open WebUI Docker image:

docker pull ghcr.io/open-webui/open-webui:main
docker stop open-webui
docker rm open-webui
# Re-run the original docker run command

Recommended Models by Use Case

Use Case	Recommended Model
General chat and writing	`llama3.2`, `mistral`
Coding and debugging	`qwen2.5-coder:7b`, `deepseek-coder-v2`
Reasoning and math	`deepseek-r1:7b`, `phi4`
Multilingual tasks	`qwen2.5:7b`
Long documents / RAG	`mistral`, `llama3.1:8b`
Low-resource machines	`gemma3:4b`, `llama3.2:1b`

FAQ

Why does Open WebUI say “Ollama not connected”?

This is almost always a URL configuration issue. Try these fixes in order:
1. Make sure Ollama is running (ollama serve in a terminal)
2. In Open WebUI settings, change the Ollama URL to http://host.docker.internal:11434 (Docker on Windows or Mac)
3. On Linux, try http://172.17.0.1:11434 if localhost does not resolve inside Docker
4. Check that nothing is blocking port 11434 in your firewall settings

Can I run this on a CPU without a GPU?

Yes, but it will be noticeably slower. For a smooth experience on CPU only, stick to smaller models like gemma3:4b, phi4-mini, or llama3.2:1b. Larger models (7B and above) may take 30 to 60 seconds or more per response depending on your hardware.

Is my data private?

Yes, completely. Nothing leaves your machine. Ollama runs locally, Open WebUI stores data in a local Docker volume, and no usage data is sent to any third party. This is one of the main reasons people choose a local setup.

Can I use Open WebUI without Docker?

Yes. You can install it directly with Python:

pip install open-webui open-webui serve

Docker is still recommended for most users because it makes updates cleaner and avoids dependency conflicts.

Does this work on Windows Subsystem for Linux (WSL2)?

Yes. While Ollama has native Windows support, many users prefer the performance of a Linux environment. You can run Ollama inside WSL2 with full hardware support. For the best experience, follow our guide on WSL2 with GPU acceleration to ensure your NVIDIA, AMD, or Intel GPU is properly passed through to your Linux distro.

Conclusion

Running local LLMs with Ollama and Open WebUI is one of the best ways to explore AI on your own terms: fast, private, and completely free after the initial setup. Whether you are a developer, a privacy-conscious user, or just curious about what is possible, this stack gives you a production-quality experience without any cloud dependency.

Start small with a 4B or 7B model, get comfortable with the interface, then scale up as your hardware allows. The local AI ecosystem is evolving fast, and tools like Ollama and Open WebUI make it easier than ever to keep up.

Have questions or run into issues? Leave a comment below.

How to Run Local LLMs: Ollama & Open WebUI Guide (2026)

Introduction

What You Will Need

Part 1: Installing Ollama

Windows and macOS

Linux

Verify the Installation

Part 2: Downloading Your First Model

Popular Models to Start With

Pull a Model

Run a Model in the Terminal

List Downloaded Models

Remove a Model

Part 3: Installing Open WebUI

Option A: Docker (Recommended)

Option B: If You Have an NVIDIA GPU

Access the Interface

Part 4: Connecting Open WebUI to Ollama

Manual Configuration

Part 5: Using Open WebUI

Starting a Chat

Key Features Worth Knowing

Part 6: Tips and Optimizations

Speed Up Inference

Run Ollama as a Network Server

Keep Everything Updated

Recommended Models by Use Case

FAQ

Conclusion

One thought on “How to Run Local LLMs: Ollama & Open WebUI Guide (2026)”

Leave a Reply Cancel reply

Introduction

What You Will Need

Part 1: Installing Ollama

Windows and macOS

Linux

Verify the Installation

Part 2: Downloading Your First Model

Popular Models to Start With

Pull a Model

Run a Model in the Terminal

List Downloaded Models

Remove a Model

Part 3: Installing Open WebUI

Option A: Docker (Recommended)

Option B: If You Have an NVIDIA GPU

Access the Interface

Part 4: Connecting Open WebUI to Ollama

Manual Configuration

Part 5: Using Open WebUI

Starting a Chat

Key Features Worth Knowing

Part 6: Tips and Optimizations

Speed Up Inference

Run Ollama as a Network Server

Keep Everything Updated

Recommended Models by Use Case

FAQ

Conclusion

Share this:

Related Posts

OpenClaw vs ChatGPT vs Gemini App: Which AI Assistant Is Right for You?

How to Set Up WSL2 with GPU Acceleration on Windows 11 (NVIDIA, AMD and Intel)

Claude Code vs GitHub Copilot: Which AI Coding Tool Should You Use in 2026?

One thought on “How to Run Local LLMs: Ollama & Open WebUI Guide (2026)”

Leave a Reply Cancel reply