Why the Voice PE beat Alexa in my house

The Echo in the kitchen took it upon itself, one Tuesday in February, to ask if I’d like to know more about Amazon Music Unlimited. I had asked it to set a five-minute timer. I told it no. It thanked me for my feedback. The tea was overdone by the time we’d finished negotiating.

That was around the point I stopped pretending the Alexas were fine.

They used to be fine. For years they sat in corners, did what they were told, occasionally heard the microwave and turned the kettle on, but mostly just worked. Then somewhere along the way Amazon decided the assistant in your kitchen also wanted to upsell you, surface news briefings you didn’t ask for, and quietly accumulate skills you hadn’t installed. Wake word sensitivity started drifting. The Echo in the living room would fire up halfway through a TV scene and try to interpret an actor’s line as a command. Half the time it would just sit there with the ring glowing, having understood nothing, waiting to be told off.

I’d been running Home Assistant for ages by then. The Voice PE units had been on my radar since they shipped – the small green pucks with the LED ring on top, Home Assistant’s own voice satellite hardware. I’d looked at them a few times, gone “yeah, eventually”, and carried on letting Amazon listen to me cook.

Eventually came in March.

I ordered one to start with. Plugged it into the kitchen socket, set up the Wyoming integration, pointed it at the assist pipeline already running on DIRECTIVE. It found the wake word. It heard me. It turned the kitchen light on. The whole setup took less time than it took to read the unboxing leaflet.

The first thing I noticed wasn’t a feature. It was the absence of one. The Voice PE didn’t have anything to sell me. It wasn’t going to ask if I wanted news headlines or surface the deal on whatever Amazon had decided I needed this week. I said ‘set a timer for five minutes’, it said ‘five minute timer started’, and it stopped talking. That was the entire interaction.

It is genuinely strange how unfamiliar that felt.

The other thing that landed early was speed. Local-first sounds like a compromise on paper – the assumption is always that the cloud is faster because the cloud has more compute. In practice the round-trip to AWS adds enough latency that a local pipeline running on hardware in the same room beats it comfortably. The Voice PE doesn’t have to phone home, doesn’t have to wait for a response, doesn’t have to be downgraded by congestion on someone else’s data centre. It just answers.

Custom intents showed up naturally once I’d stopped thinking in Alexa-shaped phrases. ‘Set the office to dim’ does exactly that, with no detour through a Whole Home Music pitch. ‘Are we still recording’ checks the right Plex automation. Follow-up mode (assist_satellite.start_conversation, for the actual call) lets me ask one thing then another without re-saying the wake word every twenty seconds. It’s the kind of small affordance you don’t realise you’ve been wanting until you have it, and then you can’t believe you put up with the alternative.

The LCARS dashboard for the voice flow is the part that probably shouldn’t have been as enjoyable as it was. I’d themed the rest of Home Assistant in LCARS months earlier – a different post, another time – and adding a card that visualises what the satellite is doing in real time turned out to be one of those small pleasures that earns its keep daily. Watching the ring glow on the device while the dashboard mirrors the same state on the wall has a stupid amount of charm. The Trek brain rot continues to pay dividends.

The privacy angle, briefly, because it’s worth saying once. It’s nicer when the device on the kitchen worktop isn’t relaying every utterance to a third party for transcription review. That isn’t paranoia. That’s just preferring the more pleasant option when there is one.

It’s not flawless. The wake word occasionally misfires on similar-sounding words. The LED ring uses a colour scheme I am still getting used to. The odd intent doesn’t quite parse the way I expected, and I have to rephrase. None of that is bad. It’s just the texture of running your own thing, where the bugs are yours and the fixes are yours.

The Voice PE units are part of a bigger voice setup here. The M5Stack ATOM EchoS3R units (COMM-01 and COMM-02, on pyramid bases because flat speakers look wrong in this house) cover the smaller rooms. There’s an ancient OnePlus 2 running as a Wyoming satellite in a corner because I had it lying around. The StackChan is on preorder and will eventually do its own thing. But the Voice PE units are the dependable bit. The workhorses. The ones I forget about until someone says something to a kitchen wall and a light comes on.

Three more on order at the time of writing. The calculus, if I’d run it earlier, would have been embarrassingly simple. They cost less than I’d assumed, do exactly what I tell them, and stop talking when they’re finished.

That last one used to be a baseline expectation of consumer hardware. Now it counts as a feature.