Reminder on how to run ollama locally and with RAG

LLM

data science

Author

Published

November 12, 2025

Small reminder to myself on how to run my local and cloud ollama LLMs with an a custom document database for RAG.

When possible, and for a number of reasons, I prefer to run LLMs locally. This very short tutorial is for my reference, though likely to be useful to others, on how to do it because I constantly forget.

Install ollama

Very simple:

curl -fsSL https://ollama.com/install.sh | sh

Or whatever method is described for your system here. I use linux ubuntu becuase I am a bit of nerd.

Download models

Since ollama made some models available for free in the cloud, there is little need to download full models, and your computer won’t suffer as much while prompting an 8b model:

llama3.1:8b runs surprisingly well fully locally in a 16GB RAM laptop, no GPU.

Ollama’s cloud is a new way to run open models using datacenter-grade hardware. Many new models are too large to fit on widely available GPUs, or run very slowly. Ollama’s cloud provides a way to run these models fast while using Ollama’s App, CLI, and API.

and seems to be privacy friendly:

Ollama does not log or retain any queries.

(optional) Sign into ollama

This is only necessary to use the cloud models:

register to ollama https://ollama.com/
create an api key https://ollama.com/settings/keys
set an env variable with the key export OLLAMA_API_KEY=your_api_key
then run ollama signin

Install and run a model

The neat thing is that when a model is not available it will be automatically downloaded. For cloud models it’ll be just a pointer but all the regular ollama commands should run as usual.

Anyway, the install mode is the the same:

# a cloud model
# Note the `-cloud` suffix.
ollama run gpt-oss:120b-cloud

# completely local model
ollama run llama3.1:8b

This will start a prompt in the terminal, which can be exited with /bye.

Use a GUI for prompting

Open WebUI is da bomb! Get started with https://docs.openwebui.com/getting-started/quick-start/

I will use a docker image but deploy with Podman because (i) Docker is a pain and (ii) I sometimes try new stuff:

Specially when someone created such great instructions: https://roderik.no/podman-and-open-webui

# assumes podman is installed
# sudo apt-get -y install podman

podman pull ghcr.io/open-webui/open-webui:main

podman run --replace -p 127.0.0.1:3000:8080 --network=pasta:-T,11434 --env WEBUI_AUTH=False \
--add-host=localhost:127.0.0.1 \
--env 'OLLAMA_BASE_URL=http://localhost:11434' \
--env 'ANONYMIZED_TELEMETRY=False' \
-v open-webui:/app/backend/data \
--label io.containers.autoupdate=registry \
--name open-webui ghcr.io/open-webui/open-webui:main

Key flags:

WEBUI_AUTH=False because otherwise the web interface asks for a login. Ain’t nobody got time for that.
--add-host this is where ollama is serving it’s API. Nifty.
ANONYMIZED_TELEMETRY no tracking
--replace because everytime I re-start the container I get an error that the name open-webui is in use. This stops the nagging - it’s an hack not solution, I know.

Now go to your web brower and open http://127.0.0.1:3000/ to see a ChatGPT like interface. I will let you pick the model

Retrieval-Augmented Generation (RAG)

TLDR; a way of using an LLM to query your internal documents increasing the chances of getting more accurate replies and providing the source for an answer. This is where I see a lot of value in LLMs for entities that have a lot of scattered internal knowledge (or a very large scientific paper collection like yours truly).

How to get all this good stuff on your laptop using Open WebUI?

Upload the documents as a knowledge base.
Connect the knowledge base to a model.
Query the knowledge base.
Profit?

The nity gritty is that the documents are split in pieces of text (tokens) and added to a vector database.

Now a quick showcase:

Task completed.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{domingues2025,
  author = {Domingues, António},
  title = {Reminder on How to Run Ollama Locally and with {RAG}},
  date = {2025-11-12},
  url = {https://amjdomingues.com/posts/2025-11-12-local-ollama/},
  langid = {en}
}

For attribution, please cite this work as:

Domingues, António. 2025. “Reminder on How to Run Ollama Locally and with RAG.” November 12, 2025. https://amjdomingues.com/posts/2025-11-12-local-ollama/.