Take full control of your AI. No subscriptions, no data leaks, no cloud required.
Introduction
Running large language models (LLMs) locally has never been more accessible. With Ollama handling model management and Open WebUI providing a polished browser-based interface, you can have your own private ChatGPT-style setup running on your own hardware in under 30 minutes.
This guide walks you through everything: installation, model management, configuration tips, and answers to the most common questions beginners and power users ask.
What You Will Need
Before getting started, make sure your system meets the basics:
- OS: Windows 10/11, macOS (Apple Silicon or Intel), or Linux
- RAM: Minimum 8 GB (16 GB or more recommended for larger models)
- Storage: At least 10 to 50 GB free depending on which models you want to run
- GPU (optional but recommended): NVIDIA (CUDA), AMD (ROCm), or Apple Silicon (Metal). CPU-only works but is slower.
- Docker (required for Open WebUI): Get Docker
Part 1: Installing Ollama
Ollama is the engine that downloads, manages, and serves your local LLMs via a simple API.
Windows and macOS
- Go to https://ollama.com/download
- Download the installer for your operating system
- Run the installer and follow the on-screen instructions
- Once installed, Ollama runs silently in the background
Linux
Open a terminal and run:
curl -fsSL https://ollama.com/install.sh | sh
Ollama is now installed and running as a system service.
Verify the Installation
Open a terminal (or PowerShell on Windows) and run:
ollama --version
You should see a version number printed. Ollama is ready.
Part 2: Downloading Your First Model
Ollama uses a simple pull command to download models from its library.
Popular Models to Start With
| Model | Size | Best For |
|---|---|---|
llama3.2 | ~2 to 4 GB | General use, fast responses |
mistral | ~4 GB | Coding, reasoning |
gemma3:4b | ~3 GB | Lightweight, good quality |
qwen2.5:7b | ~5 GB | Multilingual, coding |
deepseek-r1:7b | ~5 GB | Reasoning, math |
phi4 | ~9 GB | Advanced reasoning, small footprint |
Pull a Model
ollama pull llama3.2
This downloads the model to your local machine. Depending on your internet speed, this may take a few minutes.
Run a Model in the Terminal
Test it quickly before setting up the UI by running:
ollama run llama3.2
Type your prompt and press Enter. Type /bye to exit.
List Downloaded Models
ollama list
Remove a Model
ollama rm modelname
Part 3: Installing Open WebUI
Open WebUI gives you a full chat interface, similar to ChatGPT, that connects directly to your local Ollama instance.
Option A: Docker (Recommended)
Make sure Docker Desktop is running, then execute:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
On Windows (PowerShell), use backticks instead of backslashes:
docker run -d `
-p 3000:8080 `
--add-host=host.docker.internal:host-gateway `
-v open-webui:/app/backend/data `
--name open-webui `
--restart always `
ghcr.io/open-webui/open-webui:main
Option B: If You Have an NVIDIA GPU
Replace the image tag at the end of the command with the CUDA version:
ghcr.io/open-webui/open-webui:cuda
Access the Interface
Once the container is running, open your browser and go to:
http://localhost:3000
The first time you visit, you will be prompted to create an admin account. This account is local only. No data is sent anywhere.
Part 4: Connecting Open WebUI to Ollama
Open WebUI should auto-detect Ollama if both are running on the same machine.
Manual Configuration
If models do not appear automatically:
- Log into Open WebUI
- Click your profile icon and go to Settings
- Navigate to Connections
- Set the Ollama API URL to:
- Linux/macOS:
http://localhost:11434 - Docker on Windows/Mac:
http://host.docker.internal:11434
- Linux/macOS:
- Click Save
Your downloaded models will now appear in the model selector dropdown at the top of the chat window.
Part 5: Using Open WebUI
Starting a Chat
- Open
http://localhost:3000in your browser - Select a model from the dropdown at the top
- Type your message and hit Enter
Key Features Worth Knowing
- System Prompts: Give the model a custom persona or a set of rules before the conversation starts
- Multiple Models: Switch between models mid-session or compare two side by side
- Chat History: All conversations are saved locally in the Docker volume
- File Upload: Upload documents for the model to reference (PDF, TXT, and more)
- RAG (Retrieval-Augmented Generation): Index your own documents and query them through chat
- Image Generation: Connect to a local Stable Diffusion instance for image creation
- Web Search: Enable real-time search grounding for more accurate and up-to-date answers
Part 6: Tips and Optimizations
Speed Up Inference
- Use a GPU whenever possible. Even a mid-range GPU dramatically improves response speed.
- Reduce context length in model settings if you do not need long conversations.
- Use smaller quantized models (for example
q4_K_M) for faster performance with minor quality trade-offs.
Run Ollama as a Network Server
To access Ollama from other devices on your local network, set the host before starting:
OLLAMA_HOST=0.0.0.0 ollama serve
Then update Open WebUI’s Ollama URL to your machine’s local IP address, for example http://192.168.1.100:11434.
Keep Everything Updated
Update Ollama:
- Windows/macOS: Re-download and reinstall from ollama.com
- Linux: Re-run the install script
Update Open WebUI Docker image:
docker pull ghcr.io/open-webui/open-webui:main
docker stop open-webui
docker rm open-webui
# Re-run the original docker run command
Recommended Models by Use Case
| Use Case | Recommended Model |
|---|---|
| General chat and writing | llama3.2, mistral |
| Coding and debugging | qwen2.5-coder:7b, deepseek-coder-v2 |
| Reasoning and math | deepseek-r1:7b, phi4 |
| Multilingual tasks | qwen2.5:7b |
| Long documents / RAG | mistral, llama3.1:8b |
| Low-resource machines | gemma3:4b, llama3.2:1b |
FAQ
This is almost always a URL configuration issue. Try these fixes in order:
1. Make sure Ollama is running (ollama serve in a terminal)
2. In Open WebUI settings, change the Ollama URL to http://host.docker.internal:11434 (Docker on Windows or Mac)
3. On Linux, try http://172.17.0.1:11434 if localhost does not resolve inside Docker
4. Check that nothing is blocking port 11434 in your firewall settings
Yes, but it will be noticeably slower. For a smooth experience on CPU only, stick to smaller models like gemma3:4b, phi4-mini, or llama3.2:1b. Larger models (7B and above) may take 30 to 60 seconds or more per response depending on your hardware.
Yes, completely. Nothing leaves your machine. Ollama runs locally, Open WebUI stores data in a local Docker volume, and no usage data is sent to any third party. This is one of the main reasons people choose a local setup.
Yes. You can install it directly with Python:pip install open-webui
open-webui serve
Docker is still recommended for most users because it makes updates cleaner and avoids dependency conflicts.
Yes. While Ollama has native Windows support, many users prefer the performance of a Linux environment. You can run Ollama inside WSL2 with full hardware support. For the best experience, follow our guide on WSL2 with GPU acceleration to ensure your NVIDIA, AMD, or Intel GPU is properly passed through to your Linux distro.
Conclusion
Running local LLMs with Ollama and Open WebUI is one of the best ways to explore AI on your own terms: fast, private, and completely free after the initial setup. Whether you are a developer, a privacy-conscious user, or just curious about what is possible, this stack gives you a production-quality experience without any cloud dependency.
Start small with a 4B or 7B model, get comfortable with the interface, then scale up as your hardware allows. The local AI ecosystem is evolving fast, and tools like Ollama and Open WebUI make it easier than ever to keep up.
Have questions or run into issues? Leave a comment below.

One thought on “How to Run Local LLMs: Ollama & Open WebUI Guide (2026)”