Description:
Windows PC application that records your microphone input and sends it to Google’s Speech API service for transcription. The parsed results are then sent to HomeSeer via the JSON API and processed as if you were speaking to an Echo or Dot, but without the need to say “Tell HomeSeer to…”. HomeSeer's response will be spoken through the chosen Playback Device.
Why:
I have some HTPCs throughout the house using HDMI for audio and wanted to make use of the extra speaker output and microphone input for voice control. I was unhappy with speaker.exe due to 3 reasons:
1. The ‘always listening for trigger word’ portion worked poorly in noisy environments (watching TV, listening to music), triggering when it shouldn’t
2. Speaker.exe uses Microsoft’s speech recognition API which works poorly in general, even worse in noisy environments, and requires a grammar file which limits what you can say and send to HS
3. It doesn’t utilize the HS3 JSON API to send the transcribed text to HS, which means you can’t use Jon00’s excellent Alexa/Echo helper plugin (https://board.homeseer.com/showthread.php?t=184504) or send custom responses back.
I still use speaker.exe on these HTPCs so I can push TTS to them, for example, for weather warnings.
Features:
- Microphone input selection
- Recording quality selection
- TTS output device selection
- Volume control device selection
------ Automatically lower volume on host PC when listening starts, fade up when complete
- Automatically trigger HomeSeer event upon listen start and upon completion
------ Useful for location aware commands, and pausing media or lowering volume at receiver during listening
- Command line arguments for external control: start, stop, cancel
- Loops over a user defined number of returned results until a successful command is found
------ Google often returns a group of possible transcriptions, each with a ‘confidence’ level. This app will loop through each one from highest confidence to lease confidence until HS returns a positive response.
Requirements:
- Google Speech API key – please Google for how to obtain this
Installation:
Obtain Google Speech API key. Unzip files to a folder on your hard drive. Edit SpeachRecognition.exe.config. Run the exe file.
Configuration:
I’m sure this isn’t the best way to handle configuration parameters in an app like this but it was the quickest thing I could find. Sorry I don't have an interface for these. There is a file called “SpeachRecognition.exe.config” that stores some settings. You should edit it with a text editor. Here they are described with default values:
ITEMS IN [BRACKETS] MUST BE CHANGED PRIOR TO FIRST USE
hsURL = http://[IPorDomain]:[Port]
- Path to a HomeSeer 3 machine with JSON enabled.
googleAPI = [your api key]
- Your Google Speech API key
startResponse = Yes?
- TTS response when listening begins
maxListenSecs = 3
- Maximum number of seconds app will listen before sending to Google
hsUser = [HomeSeer username]
- A HomeSeer 3 user that can run JSON commands
hsPassword = [HomeSeer password]
- Password for the user above
maxPhrasesToTry = 10
- Number of returned Google results to try against the HomeSeer 3 server (I’ve never seen 10)
logFileLocation = (blank)
- Full path and filename to log file. Must be writable by user running app. If not blank, a few items will be logged. Currently: raw JSON results from Google, everything in the app UI’s textbox window.
maxTextBoxLines = 100
- Maximum number of lines to keep in the UI’s textbox. If the app runs for days/months, a lot of text could build up in there. This trims that.
startHsEventId = 0
- HomeSeer 3 event to run when listening starts. Tip: Use a browser’s developer’s tools to get event IDs.
stopHsEventId = 0
- HomeSeer 3 event to run when processing stops. Tip: Use a browser’s developer’s tools to get event IDs.
The following are set via the UI and stored somewhere else, so no need to edit them here:
cmbDevicesSetting, cmbFreqSetting, cmbPlaybackDevicesSetting, cmbVolumeDevicesSetting
Command Line Parameters:
start - starts recording
stop - stops recording and sends to google
cancel - stops recording and does not send to google
Limitations:
- The free/developer Google Speech API key allows 50 requests per day (I've only hit that limit during testing, so far)
- No built in always listening mode (I’m currently using VoxCommando to trigger my app to start listening)
- No built in silence detection so you must wait for ‘maxListenSecs’ or send a 'stop' via command line to stop listening and send to Google
Disclaimer:
- This is my very first attempt at a c# .net PC app. I’m a web developer by trade.
- I started with a demo app I found here (https://googlespeechtotext.codeplex.com/) and built upon that
- I’d say this is Alpha level still, although I’ve been running it for over a month. I’ve made some small tweaks recently but all seems OK. Please report any issues.
- I’d love to find someone here who has more app dev experience and time than I do to help me make this better.
- I’m not the one who originally misspelled Speach within this app, but I wasn’t concerned enough to spend time fixing it
Recommendations:
- I’m using two Kinect microphones that I purchased from one of those used video game places, on two different HTPCs. They work pretty well in a large room environment. If you shop for one, be sure to get one that has an external power supply with it or else it won’t work with your PC.
- VoxCommando has been working pretty well for the always listening portion. I need to experiment more. I use it to trigger this app via command line arguments: start, stop, cancel. The trial allows 40 executed commands before requiring app restart.
- I highly recommend the IVONA Amy (British) voice for TTS over any of Microsoft’s built in voices. Sounds better than Alexa, Siri, and Google, to me.
- Higher recording quality takes longer to send to Google but increases the chances it will be transcribed correctly.
- If you're also running this on an HTPC connected to a TV or Receiver via HDMI, I recommend buying a cheap pair of PC speakers to connect to your HTPC's analog audio outputs so that you can hear HomeSeer's response even if the TV/Receiver is off, or if you're bit-streaming to your TV/Receiver.
For another handy way to voice control HS3, checkout “Nearly instant HS3 voice control via Android's ‘Automate’ app!” (https://forums.homeseer.com/showthread.php?t=179768)
I hope folks find this useful, as-is. I may make upgrades as I need them and I will entertain feature requests. However, I don’t have much time to work on stuff like this. If you have some app dev experience and would like to help me make this better, please let me know.
Mike
Comment