Furby with local AI using Ollama
Last updated:
Introduction
Previously we set up a Furby with ChatGPT. What if we want to run an AI Furby locally without an internet connection? Home Assistant allows this by
- providing an Ollama integration so we can interface with an Ollama instance run on our local network, and
- providing a TTS and STT integration that can run locally on the Home Assistant server.
LLMs require a lot of memory so the Raspberry Pi running Home Assistant will not be adequate. Instead, here we will use a Macbook Pro with an M1 chip can run some models locally, a machine with 64GB of RAM can e.g. load a 32b model that uses 19GB. For running a “full” 70b or 405b model dedicated AI hardware is recommended.
Local LLM
Setting up Ollama server
Downlaod Ollama and install it. This will provide the ollama
command for managing our language models.
Make sure Ollama is accessible from the network. The service might not be exposed to the local network, if that is the case try running it with
OLLAMA_HOST="0.0.0.0" OLLAMA_DEBUG=1 ollama serve
on MacOS you may also set the env var using
launchctl setenv OLLAMA_HOST "0.0.0.0"
To check the logs for debugging
tail -f ~/.ollama/logs/server.log
Ollama runs on port 11434
, you can test if the service is running by navigating to http://[ip]:11434
which should return with Ollama is running
.
Pulling LLMs
Ollama offers a range of LLMs to choose from, a list of available LLMS can be found at https://ollama.com/library. Once we picked a model we can down;load it e.g. with
ollama pull deepseek-r1:32b
Here are some useful commands:
# list installed LLMsollama list# delete a certain LLMollama rm llama3.1:70b
You can also create a custom model from a base model. This cuts down on precious context size and makes the models easier to manage.
ollama create llama3.2--homeassistant -f ./homeassistant.model
An example for a “Home Assistant” model with less random answers than usual could be
FROM llama3.2PARAMETER temperature 0.0PARAMETER seed 0PARAMETER num_ctx 8192
SYSTEM """You are a voice assistant for Home Assistant.Answer in plain text. Keep it simple and to the point."""
seed 0
makes the model’s answers more deterministic by setting a specific value. A temperature
of 0 removes additional randomness from the response (alternatively you can also set top_k
to 1, a goof resource for checking the effects for the different LLM params is https://artefact2.github.io/llm-sampling/index.xhtml). num_ctx
sets how much context is used. All configurable params can be found here.
For our Furby use-case we can simply add the context we have previously placed into the integration’s configuration to a dedicated model like this:
FROM llama3.2SYSTEM """You are a cute cuddly Furby. Your answers should be accurate, funny and comprehendible by a 10 year old.Assume your answer is spoken, so do not use emojis or other visual cues."""
Image recognition
An LLM like llava
could be used for local image analysis. If we have a camera set up that will save an image file to the current dir we can run
ollama run llava "What do you see in this picture: ./test.jpeg"
to get a description of the image.
WebUI
[!/pi-hq-cam.jpg]
To interact with Ollama, we can use Open WebUI. This will allow us to manage the LLMs, run queries on them and see the results making it easy to compare different models capabilities.
You can simply start it on the machine that is running the Ollama service by running it via docker:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Then navigate to http://[ip]:3000
to use it.
Add Ollama integration
Under Integrations
select Ollama
.
- Enter the URL to you local Ollama server.
- The next dialog shows you the models exposed to pick from.
- This will create a service with the model’s name, we can rename it by selecting
Rename
from the context menu. - Under
Configure
we can set additional context, whether we want to expose Entities to the service, context window size and max history length to remember.
Create the local assistant
Now we just need to connect everything up.
- Go to
Settings
->Voice Assistants
. - Click
Add Assistant
and pick an appropriate name, e.g. the language model name used. - Pick the Ollama conversation agent created above.
- As
Speech-to-text
service selectfaster-whisper
created earlier. - As
Text-to-speech
service selectpiper
created earlier.