Running LLMs Locally Using Podman, Ollama & OpenWebUI

Introduction

In this post, we will be looking at how to run local AI models using Openwebui, Ollama and podman. We will start by understanding what Podman is and how it differs from docker and then implement this knowledge by creating a pod with two containers: Ollama and Openwebui.

Podman

Podman is a daemonless and rootless open-source container management system, which allows you to quickly create containers for services with low overhead. Here are some advantages to using podman:

Security: podman doesn't use daemon, unlike docker, which runs as root. This means it has a lower attack footprint as malicious actors may target vectors like a daemon, because of its privileged permissions. Furthermore, each container is lauched with SELinux which uses security policies to enforce and manage processes, files and apps 1 . Moreover, each container can be managed without using root, restricting it to only what each user can access.
SystemD: podman has great integration with systemd via quadlets, where users can create containers and manage containers via systemd2.
Ease of Use: The podman command-line is also very similar to docker, so learning podman will be easy to pick-up and get going. Furthermore, pods group containers together allowing for easier management of multiple containers and services, similiar to docker compose. These pods can then be re-initialised via podman generate kube <pod name>, which generates a .yml file. This file can then be used to quickly recreate the pod using podman kube play <yml file>.

Creating our pod

So we want to run our own local AI models using podman. How to do this? Firstly, we know these services will be linked and so we should create a pod to make it easier to manage these services together.

podman create --pod localai -p 11434:11434 -p 3000:8080

We used create to make a pod and then give it a name: localai. We then map the ports that each service uses. We first map 11434 on the host to the same port within the container. This is how will connect to Ollama from our host machine. Secondly, we map port 3000 to port 8080 in the OpenWebUI container.

Note You can use any arbitrary port you want, as long as it is not being used. We must also declare any port mappings during pod creation. Podman does not allow you to retroactively add port mappings after pod creation.

Adding containers to our pod

We now need to add the Ollama and OpenWebUI container to the pod and then we can launch the pod and hopefully everything should work! Let's do the Ollama container first.

Ollama

We can create a container like so and add to our newly-created pod!

podman create --name ollama --pod=localai -v ollama:/root/.ollama --device /dev/kfd --device /dev/dri ollama/ollama:rocm

Here, we create a container, give it a name, add it to the pod and create a volume where ollama will store all of our downloaded models and then provide the image name. We don't need to provide port mappings here, as we have already done so during pod creation.

GPU Tip I am using an AMD GPU, so the above command should work as I am giving the container access to the devices dri and kfd. For users who are not using an AMD GPU, please visit this link to see the correct configuration for your graphics card.

Openwebui

We will follow similiar steps for Openwebui:

podman create --name openwebui --pod=localai -v open-webui:/app/backend/data --name ghcr.io/open-webui/open-webui:latest

Pulling images If you are running these commands for the first time, podman will automatically pull the images down from their respective repositories. Once the images have been downloaded, any subsequent container creations which utilises those images will be used from your local repository, stored on your system.

Once done, we should have three containers in our pod. We can verify this via: podman pod ls and look at the last column. You may be wondering why the number of containers is three and not two.

POD ID        NAME        STATUS      CREATED        INFRA ID      # OF CONTAINERS
2ce6f54a43de  ai          Created     8 seconds ago  1cf56691c890  3

podman, by default, creates a container to manage the pod (typically called an infra-container). This gets automatically added to the pod, so we don't really need to worry about this, for now.

Lauching the pod

We can now launch our pod and check to see if everything works okay.

podman pod start localai

Once executed, allow the containers to boot up for a few seconds then head over to localhost:<your port number>. You should see the OpenWebUI sign in page. This simply creates a local account and doesn't require a valid email.

Fantastic! You can also check to see if Ollama is running:

➜ curl 127.0.0.1:11434
Ollama is running

You should get the "Ollama is running" output!

Conclusion

We did it! We can now run local AI models on our own systems, using a ChatGPT-like interface using Ollama and our own GPU.

Hopefully, you can see the positives of using podman over docker, and how using pods can be helpful in grouping services together.

PreviousExploiting the Docker Group NextProgramming

Last updated 1 month ago