Furby with local AI using Ollama

Last updated: Jan 19, 2025

Furby

Introduction

Previously we set up a Furby with ChatGPT. What if we want to run an AI Furby locally without an internet connection? Home Assistant allows this by

providing an Ollama integration so we can interface with an Ollama instance run on our local network, and
providing a TTS and STT integration that can run locally on the Home Assistant server.

LLMs require a lot of memory so the Raspberry Pi running Home Assistant will not be adequate. Instead, here we will use a Macbook Pro with an M1 chip can run some models locally, a machine with 64GB of RAM can e.g. load a 32b model that uses 19GB. For running a “full” 70b or 405b model dedicated AI hardware is recommended.

Local LLM

Setting up Ollama server

Downlaod Ollama and install it. This will provide the ollama command for managing our language models.

Make sure Ollama is accessible from the network. The service might not be exposed to the local network, if that is the case try running it with

OLLAMA_HOST="0.0.0.0" OLLAMA_DEBUG=1 ollama serve

on MacOS you may also set the env var using

launchctl setenv OLLAMA_HOST "0.0.0.0"

To check the logs for debugging

tail -f  ~/.ollama/logs/server.log

Ollama runs on port 11434, you can test if the service is running by navigating to http://[ip]:11434 which should return with Ollama is running.

Pulling LLMs

Ollama offers a range of LLMs to choose from, a list of available LLMS can be found at https://ollama.com/library. Once we picked a model we can down;load it e.g. with

ollama pull deepseek-r1:32b

Here are some useful commands:

# list installed LLMs
ollama list
# delete a certain LLM
ollama rm llama3.1:70b

You can also create a custom model from a base model. This cuts down on precious context size and makes the models easier to manage.

ollama create llama3.2--homeassistant -f ./homeassistant.model

An example for a “Home Assistant” model with less random answers than usual could be

FROM llama3.2
PARAMETER temperature 0.0
PARAMETER seed 0
PARAMETER num_ctx 8192

SYSTEM """
You are a voice assistant for Home Assistant.
Answer in plain text. Keep it simple and to the point.
"""

seed 0 makes the model’s answers more deterministic by setting a specific value. A temperature of 0 removes additional randomness from the response (alternatively you can also set top_k to 1, a goof resource for checking the effects for the different LLM params is https://artefact2.github.io/llm-sampling/index.xhtml). num_ctx sets how much context is used. All configurable params can be found here.

For our Furby use-case we can simply add the context we have previously placed into the integration’s configuration to a dedicated model like this:

FROM llama3.2
SYSTEM """
You are a cute cuddly Furby. Your answers should be accurate, funny and comprehendible by a 10 year old.
Assume your answer is spoken, so do not use emojis or other visual cues.
"""

Image recognition

An LLM like llava could be used for local image analysis. If we have a camera set up that will save an image file to the current dir we can run

ollama run llava "What do you see in this picture: ./test.jpeg"

to get a description of the image.

WebUI

[!/pi-hq-cam.jpg]

To interact with Ollama, we can use Open WebUI. This will allow us to manage the LLMs, run queries on them and see the results making it easy to compare different models capabilities.

You can simply start it on the machine that is running the Ollama service by running it via docker:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Then navigate to http://[ip]:3000 to use it.

Add Ollama integration

Under Integrations select Ollama.

Enter the URL to you local Ollama server.
The next dialog shows you the models exposed to pick from.
This will create a service with the model’s name, we can rename it by selecting Rename from the context menu.
Under Configure we can set additional context, whether we want to expose Entities to the service, context window size and max history length to remember.

Create the local assistant

Now we just need to connect everything up.

Go to Settings -> Voice Assistants.
Click Add Assistant and pick an appropriate name, e.g. the language model name used.
Pick the Ollama conversation agent created above.
As Speech-to-text service select faster-whisper created earlier.
As Text-to-speech service select piper created earlier.