How to Run Local LLMs: Ollama & Open WebUI Guide (2026)

Take full control of your AI. No subscriptions, no data leaks, no cloud required.


Introduction

Running large language models (LLMs) locally has never been more accessible. With Ollama handling model management and Open WebUI providing a polished browser-based interface, you can have your own private ChatGPT-style setup running on your own hardware in under 30 minutes.

This guide walks you through everything: installation, model management, configuration tips, and answers to the most common questions beginners and power users ask.


What You Will Need

Before getting started, make sure your system meets the basics:

  • OS: Windows 10/11, macOS (Apple Silicon or Intel), or Linux
  • RAM: Minimum 8 GB (16 GB or more recommended for larger models)
  • Storage: At least 10 to 50 GB free depending on which models you want to run
  • GPU (optional but recommended): NVIDIA (CUDA), AMD (ROCm), or Apple Silicon (Metal). CPU-only works but is slower.
  • Docker (required for Open WebUI): Get Docker

Part 1: Installing Ollama

Ollama is the engine that downloads, manages, and serves your local LLMs via a simple API.

Windows and macOS

  1. Go to https://ollama.com/download
  2. Download the installer for your operating system
  3. Run the installer and follow the on-screen instructions
  4. Once installed, Ollama runs silently in the background

Linux

Open a terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

Ollama is now installed and running as a system service.

Verify the Installation

Open a terminal (or PowerShell on Windows) and run:

ollama --version

You should see a version number printed. Ollama is ready.


Part 2: Downloading Your First Model

Ollama uses a simple pull command to download models from its library.

Popular Models to Start With

ModelSizeBest For
llama3.2~2 to 4 GBGeneral use, fast responses
mistral~4 GBCoding, reasoning
gemma3:4b~3 GBLightweight, good quality
qwen2.5:7b~5 GBMultilingual, coding
deepseek-r1:7b~5 GBReasoning, math
phi4~9 GBAdvanced reasoning, small footprint

Pull a Model

ollama pull llama3.2

This downloads the model to your local machine. Depending on your internet speed, this may take a few minutes.

Run a Model in the Terminal

Test it quickly before setting up the UI by running:

ollama run llama3.2

Type your prompt and press Enter. Type /bye to exit.

List Downloaded Models

ollama list

Remove a Model

ollama rm modelname

Part 3: Installing Open WebUI

Open WebUI gives you a full chat interface, similar to ChatGPT, that connects directly to your local Ollama instance.

Option A: Docker (Recommended)

Make sure Docker Desktop is running, then execute:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

On Windows (PowerShell), use backticks instead of backslashes:

docker run -d `
  -p 3000:8080 `
  --add-host=host.docker.internal:host-gateway `
  -v open-webui:/app/backend/data `
  --name open-webui `
  --restart always `
  ghcr.io/open-webui/open-webui:main

Option B: If You Have an NVIDIA GPU

Replace the image tag at the end of the command with the CUDA version:

ghcr.io/open-webui/open-webui:cuda

Access the Interface

Once the container is running, open your browser and go to:

http://localhost:3000

The first time you visit, you will be prompted to create an admin account. This account is local only. No data is sent anywhere.


Part 4: Connecting Open WebUI to Ollama

Open WebUI should auto-detect Ollama if both are running on the same machine.

Manual Configuration

If models do not appear automatically:

  1. Log into Open WebUI
  2. Click your profile icon and go to Settings
  3. Navigate to Connections
  4. Set the Ollama API URL to:
    • Linux/macOS: http://localhost:11434
    • Docker on Windows/Mac: http://host.docker.internal:11434
  5. Click Save

Your downloaded models will now appear in the model selector dropdown at the top of the chat window.


Part 5: Using Open WebUI

Starting a Chat

  1. Open http://localhost:3000 in your browser
  2. Select a model from the dropdown at the top
  3. Type your message and hit Enter

Key Features Worth Knowing

  • System Prompts: Give the model a custom persona or a set of rules before the conversation starts
  • Multiple Models: Switch between models mid-session or compare two side by side
  • Chat History: All conversations are saved locally in the Docker volume
  • File Upload: Upload documents for the model to reference (PDF, TXT, and more)
  • RAG (Retrieval-Augmented Generation): Index your own documents and query them through chat
  • Image Generation: Connect to a local Stable Diffusion instance for image creation
  • Web Search: Enable real-time search grounding for more accurate and up-to-date answers

Part 6: Tips and Optimizations

Speed Up Inference

  • Use a GPU whenever possible. Even a mid-range GPU dramatically improves response speed.
  • Reduce context length in model settings if you do not need long conversations.
  • Use smaller quantized models (for example q4_K_M) for faster performance with minor quality trade-offs.

Run Ollama as a Network Server

To access Ollama from other devices on your local network, set the host before starting:

OLLAMA_HOST=0.0.0.0 ollama serve

Then update Open WebUI’s Ollama URL to your machine’s local IP address, for example http://192.168.1.100:11434.

Keep Everything Updated

Update Ollama:

  • Windows/macOS: Re-download and reinstall from ollama.com
  • Linux: Re-run the install script

Update Open WebUI Docker image:

docker pull ghcr.io/open-webui/open-webui:main
docker stop open-webui
docker rm open-webui
# Re-run the original docker run command

Recommended Models by Use Case

Use CaseRecommended Model
General chat and writingllama3.2, mistral
Coding and debuggingqwen2.5-coder:7b, deepseek-coder-v2
Reasoning and mathdeepseek-r1:7b, phi4
Multilingual tasksqwen2.5:7b
Long documents / RAGmistral, llama3.1:8b
Low-resource machinesgemma3:4b, llama3.2:1b

FAQ

Why does Open WebUI say “Ollama not connected”?

This is almost always a URL configuration issue. Try these fixes in order:
1. Make sure Ollama is running (ollama serve in a terminal)
2. In Open WebUI settings, change the Ollama URL to http://host.docker.internal:11434 (Docker on Windows or Mac)
3. On Linux, try http://172.17.0.1:11434 if localhost does not resolve inside Docker
4. Check that nothing is blocking port 11434 in your firewall settings

Can I run this on a CPU without a GPU?

Yes, but it will be noticeably slower. For a smooth experience on CPU only, stick to smaller models like gemma3:4b, phi4-mini, or llama3.2:1b. Larger models (7B and above) may take 30 to 60 seconds or more per response depending on your hardware.

Is my data private?

Yes, completely. Nothing leaves your machine. Ollama runs locally, Open WebUI stores data in a local Docker volume, and no usage data is sent to any third party. This is one of the main reasons people choose a local setup.

Can I use Open WebUI without Docker?

Yes. You can install it directly with Python:

pip install open-webui
open-webui serve


Docker is still recommended for most users because it makes updates cleaner and avoids dependency conflicts.

Does this work on Windows Subsystem for Linux (WSL2)?

Yes. While Ollama has native Windows support, many users prefer the performance of a Linux environment. You can run Ollama inside WSL2 with full hardware support. For the best experience, follow our guide on WSL2 with GPU acceleration to ensure your NVIDIA, AMD, or Intel GPU is properly passed through to your Linux distro.


Conclusion

Running local LLMs with Ollama and Open WebUI is one of the best ways to explore AI on your own terms: fast, private, and completely free after the initial setup. Whether you are a developer, a privacy-conscious user, or just curious about what is possible, this stack gives you a production-quality experience without any cloud dependency.

Start small with a 4B or 7B model, get comfortable with the interface, then scale up as your hardware allows. The local AI ecosystem is evolving fast, and tools like Ollama and Open WebUI make it easier than ever to keep up.


Have questions or run into issues? Leave a comment below.

One thought on “How to Run Local LLMs: Ollama & Open WebUI Guide (2026)

Leave a Reply

Your email address will not be published. Required fields are marked *