title = "Self-Hosted Voice Control (for Paranoids)"
description = "Building a self-hosted voice interface for HomeAssistant"
type = "post"
weight = 1
+++
#### _Building a self-hosted voice interface for HomeAssistant_
Voice control is both a new, and quite old, piece of the home automation puzzle. As far back as the 1960's, science fiction depicted seamless voice control of computers, culminating in, to me, one of Star Trek's most endearing lines: "Computer, lights", followed by the satisfying brightness of hands-free lighting!
In the last few years, real-life technology has finally progressed to the point that this is truly possible. While there have been many attempts over the years, the fact is that reliable voice recognition requires massive quantites of computing power, machine learning, and sample data. It's something that truly requires "the cloud" to be workable. But with the rise of Google and Amazon voice appliances, the privacy implications of this have come into play. As a now-widely-circulated comic puts it, 30 years ago people were concerned about police wiretaps - now, they say "Wiretap, order me some duct tape"! And this is compounded by the proprietary nature of these appliances. Sure, the company may _say_ that they don't listen to you all the time, but without visibility into the hardware and software, how much can we really trust them?
Luckily, the free software community has a couple of answers. And today, it's possible to build your own appliance! It still uses the Google/Amazon/Microsoft speech-to-text facilities, but by controlling the hardware and software, you can be sure that the device is only listening to you when you tell it to! Hopefully one day projects like Sphinx and Kaldi will be up to the task, but for now we're stuck using the cloud players, for better or worse.
The Raspberry Pi has pretty much become the go-to device for building small self-hosted appliance solutions. From wildlife cameras to [a server BMC](/post/a-raspberry-pi-bmc), the Raspberry Pi provides a fantastic base system for just about any small computing project you could want to build. This project makes use of the Raspberry Pi 3 model B, mainly because it's the most commonly available new, and due to the computing requirements of the software we will be using - the original Raspberry Pi doesn't have enough computing power, and the Raspberry Pi 2 has some software consistency issues.
The second main component of this project is the [Seeed Studio ReSpeaker (4-mic version)](http://wiki.seeed.cc/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/). The ReSpeaker provides an array of 4 microphones, one on each corner of the square board, in addition to a ring of LEDs, giving a visual appearance similar to the Google and Amazon appliances. By integrating tightly with the Raspberry Pi, you can build a very compact unit that can be placed almost anywhere and with only a single incoming cord for power, assuming WiFi is in use.
### Parts list
* 1x Raspberry Pi 3 (or newer)
* 1x SD Card for Raspberry Pi (8+ GB)
* 1x Power cord for Raspberry Pi
* 1x ReSpeaker 4-mic hat
### Assembly
Assembly of the unit is very straightfoward. The ReSpeaker attaches to the main Raspberry Pi GPIO pins, and sits above the board as seen in the picture on their site above. Once this is attached, the Raspberry Pi is ready to be installed and configured for it.
To start, this post doesn't document my HomeAssistant configuration - to do so would require its own post entirely! What is important for our purposes though is that my HomeAssistant interface is exposing multiple API endpoints, one for each room, that handle the various lighting events that happen there. You can use this method for communicating almost anything to HomeAssistant via voice control.
With the HomeAssistant side set up, we can begin configuring the Raspberry Pi.
### Kalliope
[Kalliope](https://github.com/kalliope-project/kalliope) is a free software (MIT-licensed) project to provide an always-on voice assistance. It is written in Python and features a very modular structure and extremely flexible configuration options. Unlike commercial options, though, you can inspect the code and confirm that it indeed does not report everything you say to some Internet service. Using the Snowboy library to provide a trigger word, you can then customize its behaviour based on the phrase recieved from your choice of speech-to-text provider (Google, Amazon, etc.)
I start with the [official Kalliope image](https://github.com/kalliope-project/kalliope/blob/master/Docs/installation/raspbian.md). The reason for this is twofold: first, the image provides a conveniently-configured system without having to manually `pip install` Kalliope, which even on a Raspberry Pi 3 takes upwards of an hour. Second, and most importantly, Snowboy appears to be broken with the latest Raspbian releases; it is impossible to properly compile it, and hence the `pip install` can fail in obscure ways, usually after you've already been compiling it for an hour. Using their pre-built image, and then upgrading it to the latest Raspbian, bypasses both problems and let's you get right to work.
Once you've written the Kalliope image to your SD card, boot it up, and then perform an upgrade to Raspbian Stretch (the image is Jessie):
The [ReSpeaker library](https://github.com/respeaker/seeed-voicecard) provides the drivers and utilities for using the ReSpeaker hat with Raspbian. Note however that this library won't work on Raspbian Jessie, only Stretch, which is why we have to upgrade the Kalliope image first. Once the upgrade is finished, clone this repository into a local directory and follow the instructions provided. Verify that the driver is working by checking `arecord -L` and looking for ReSpeaker entries, then configure the volume of the microphones using `alsamixer`. I find that a gain of 90 with a volume of 75 makes a fantastic value, since 100/100 results in nothing but noise. Your mileage here may vary, so do some test recordings and verify as recommended in the library README.
One downside is, however, that the ReSpeaker technically supports directional audio (like, e.g. the Alexa, using the mic closest to you for optimal performance). At the moment though I don't have this support in this project, because I'm making use of PulseAudio to handle the incoming audio, rather than directly interfacing with the ReSpeaker unit - this support would have to be built into Kalliope. It does work, but you don't get the directional listening that you might expect from reading the ReSpeaker page!
The LED portion of the ReSpeaker requires a little more work. The [examples library for the 4-mic hat](https://github.com/respeaker/4mics_hat) provides all the basic tools needed to get the LEDs working, including several samples based on Google and Amazon device patterns. In my case, I went for a very simple LED feedback design: the LEDs turn on blue while listening, then quickly turn either green on a successful command, or red on a failed command, giving some sort of user feedback without having to listen to the unit try and talk!
To do this, I created a simple Python "daemon" running under Systemd to listen for commands on a FIFO pipe and perform the required action, as well as a helper client utility to trigger the pipe. The code for these can be found [on my GitLab](https://dev.bonifacelabs.ca/joshua-automation/respeaker-led) for convenience. One interesting feature of this configuration is the Systemd unit file. It performs a git pull inside the service directory (e.g. the repo directory) to ensure the service is automatically up-to-date when the service is started. I do the same thing in my Kalliope unit file for its configuration.
The next step is to actually configure Kalliope. The examples are a good starting point, but integrating everything together is a bit more work. Below is a sample of the `brain.yml` configuration for my instance, showing how it integrates the ReSpeaker LEDs directly, as well as posting to the HomeAssistant URL.
Using this configuration as a jumping-off point, you can add multiple other options, and including the various shell commands you can ensure that the LED ring shows the status of every task. So far, the only downside I've found with Kalliope is that single-word triggers are generally unsupported; the device doesn't realize to stop listening, so try to keep them to two or more words.
I use a custom Systemd unit to ensure everything is started correctly, including output buffering, and as mentioned above ensures the configuration repository is always up-to-date with the origin, making configuration updates on-the-fly to multiple devices quick and painless.
With all of this assembled, you can test out the system and make sure it's doing what you want. Here's a sample video of my unit in action. I will probably be building a few more (and getting a few more WeMo switches and dimmers) soon!
Thank you for checking out this project, and feel free to send me any feedback! Hopefully this helps someone else build up their voice-controlled home automation system!