Announcement

Collapse
No announcement yet.

New App! - Alexa-like control via PC, location aware, adjusts volume, and more

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    New App! - Alexa-like control via PC, location aware, adjusts volume, and more




    Description:

    Windows PC application that records your microphone input and sends it to Google’s Speech API service for transcription. The parsed results are then sent to HomeSeer via the JSON API and processed as if you were speaking to an Echo or Dot, but without the need to say “Tell HomeSeer to…”. HomeSeer's response will be spoken through the chosen Playback Device.


    Why:

    I have some HTPCs throughout the house using HDMI for audio and wanted to make use of the extra speaker output and microphone input for voice control. I was unhappy with speaker.exe due to 3 reasons:

    1. The ‘always listening for trigger word’ portion worked poorly in noisy environments (watching TV, listening to music), triggering when it shouldn’t

    2. Speaker.exe uses Microsoft’s speech recognition API which works poorly in general, even worse in noisy environments, and requires a grammar file which limits what you can say and send to HS

    3. It doesn’t utilize the HS3 JSON API to send the transcribed text to HS, which means you can’t use Jon00’s excellent Alexa/Echo helper plugin (https://board.homeseer.com/showthread.php?t=184504) or send custom responses back.

    I still use speaker.exe on these HTPCs so I can push TTS to them, for example, for weather warnings.


    Features:
    - Microphone input selection
    - Recording quality selection
    - TTS output device selection
    - Volume control device selection
    ------ Automatically lower volume on host PC when listening starts, fade up when complete
    - Automatically trigger HomeSeer event upon listen start and upon completion
    ------ Useful for location aware commands, and pausing media or lowering volume at receiver during listening
    - Command line arguments for external control: start, stop, cancel
    - Loops over a user defined number of returned results until a successful command is found
    ------ Google often returns a group of possible transcriptions, each with a ‘confidence’ level. This app will loop through each one from highest confidence to lease confidence until HS returns a positive response.


    Requirements:
    - Google Speech API key – please Google for how to obtain this


    Installation:
    Obtain Google Speech API key. Unzip files to a folder on your hard drive. Edit SpeachRecognition.exe.config. Run the exe file.


    Configuration:
    I’m sure this isn’t the best way to handle configuration parameters in an app like this but it was the quickest thing I could find. Sorry I don't have an interface for these. There is a file called “SpeachRecognition.exe.config” that stores some settings. You should edit it with a text editor. Here they are described with default values:

    ITEMS IN [BRACKETS] MUST BE CHANGED PRIOR TO FIRST USE

    hsURL = http://[IPorDomain]:[Port]
    - Path to a HomeSeer 3 machine with JSON enabled.

    googleAPI = [your api key]
    - Your Google Speech API key

    startResponse = Yes?
    - TTS response when listening begins

    maxListenSecs = 3
    - Maximum number of seconds app will listen before sending to Google

    hsUser = [HomeSeer username]
    - A HomeSeer 3 user that can run JSON commands

    hsPassword = [HomeSeer password]
    - Password for the user above

    maxPhrasesToTry = 10
    - Number of returned Google results to try against the HomeSeer 3 server (I’ve never seen 10)

    logFileLocation = (blank)
    - Full path and filename to log file. Must be writable by user running app. If not blank, a few items will be logged. Currently: raw JSON results from Google, everything in the app UI’s textbox window.

    maxTextBoxLines = 100
    - Maximum number of lines to keep in the UI’s textbox. If the app runs for days/months, a lot of text could build up in there. This trims that.

    startHsEventId = 0
    - HomeSeer 3 event to run when listening starts. Tip: Use a browser’s developer’s tools to get event IDs.

    stopHsEventId = 0
    - HomeSeer 3 event to run when processing stops. Tip: Use a browser’s developer’s tools to get event IDs.

    The following are set via the UI and stored somewhere else, so no need to edit them here:
    cmbDevicesSetting, cmbFreqSetting, cmbPlaybackDevicesSetting, cmbVolumeDevicesSetting


    Command Line Parameters:
    start - starts recording
    stop - stops recording and sends to google
    cancel - stops recording and does not send to google


    Limitations:
    - The free/developer Google Speech API key allows 50 requests per day (I've only hit that limit during testing, so far)
    - No built in always listening mode (I’m currently using VoxCommando to trigger my app to start listening)
    - No built in silence detection so you must wait for ‘maxListenSecs’ or send a 'stop' via command line to stop listening and send to Google


    Disclaimer:
    - This is my very first attempt at a c# .net PC app. I’m a web developer by trade.
    - I started with a demo app I found here (https://googlespeechtotext.codeplex.com/) and built upon that
    - I’d say this is Alpha level still, although I’ve been running it for over a month. I’ve made some small tweaks recently but all seems OK. Please report any issues.
    - I’d love to find someone here who has more app dev experience and time than I do to help me make this better.
    - I’m not the one who originally misspelled Speach within this app, but I wasn’t concerned enough to spend time fixing it


    Recommendations:
    - I’m using two Kinect microphones that I purchased from one of those used video game places, on two different HTPCs. They work pretty well in a large room environment. If you shop for one, be sure to get one that has an external power supply with it or else it won’t work with your PC.
    - VoxCommando has been working pretty well for the always listening portion. I need to experiment more. I use it to trigger this app via command line arguments: start, stop, cancel. The trial allows 40 executed commands before requiring app restart.
    - I highly recommend the IVONA Amy (British) voice for TTS over any of Microsoft’s built in voices. Sounds better than Alexa, Siri, and Google, to me.
    - Higher recording quality takes longer to send to Google but increases the chances it will be transcribed correctly.
    - If you're also running this on an HTPC connected to a TV or Receiver via HDMI, I recommend buying a cheap pair of PC speakers to connect to your HTPC's analog audio outputs so that you can hear HomeSeer's response even if the TV/Receiver is off, or if you're bit-streaming to your TV/Receiver.


    For another handy way to voice control HS3, checkout “Nearly instant HS3 voice control via Android's ‘Automate’ app!” (https://forums.homeseer.com/showthread.php?t=179768)


    I hope folks find this useful, as-is. I may make upgrades as I need them and I will entertain feature requests. However, I don’t have much time to work on stuff like this. If you have some app dev experience and would like to help me make this better, please let me know.

    Mike
    Attached Files
    Last edited by mrceolla; June 3, 2017, 01:13 AM.
    HS4, Insteon, Z-wave, USB-UIRT, Harmony Hubs, Google Hub/Chromecasts/Speakers, Foscam & Amcrest cameras, EZVIZ DB1 doorbell
    Plugins: BLLAN, BLOccupied, BLUSBUIRT, Chromecast, Harmony Hub, Insteon, Jon00 Homeseer/Echo Skill Helper, Harmony Hub, Jon00 DB Charting, MediaController, NetCAM, PHLocation2, Pushover 3P, weatherXML, Z-wave

    #2
    Copy paste from word didn't work well for my initial post, so I've cleaned that up a bit and added a screenshot.

    The forum shows 3 people have downloaded the zip so far, however, no comments. I recall obtaining the Google Speech API key was not straight forward, but doable, so I hope that's not hanging you up. If so, let me know and I'll see what I can re-find.

    I'm anxious for feedback :-)
    HS4, Insteon, Z-wave, USB-UIRT, Harmony Hubs, Google Hub/Chromecasts/Speakers, Foscam & Amcrest cameras, EZVIZ DB1 doorbell
    Plugins: BLLAN, BLOccupied, BLUSBUIRT, Chromecast, Harmony Hub, Insteon, Jon00 Homeseer/Echo Skill Helper, Harmony Hub, Jon00 DB Charting, MediaController, NetCAM, PHLocation2, Pushover 3P, weatherXML, Z-wave

    Comment


      #3
      Great idea but IMHO a Pi version would be better. Much less expensive that doing it with a windows box. Now if it's running a Windows 10 on the Pi that works.

      All such projects like this would be best run on Linux than windows. Just don't need the extra cost and power usage to run Windows.
      HomeSeer Version: HS3 Standard Edition 3.0.0.548
      Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
      Number of Devices: 484 | Number of Events: 776

      Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
      3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
      4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
      3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

      Z-Net version: 1.0.23 for Inclusion Nodes
      SmartStick+: 6.04 (ZDK 6.81.3) on Server

      Comment


        #4
        Yes I agree. Embedded solutions with SoC boards are definitely the way forward. No point having energy saving devices and apps running on a 24/7 Windows system but very impressive for a first outing.

        Maybe you could look at developing your project with Linux in mind.

        Comment


          #5
          Originally posted by Timon View Post
          Great idea but IMHO a Pi version would be better. Much less expensive that doing it with a windows box. Now if it's running a Windows 10 on the Pi that works.

          All such projects like this would be best run on Linux than windows. Just don't need the extra cost and power usage to run Windows.
          Thank you for your feedback!

          I'd think that would limit the available audience quite a bit. There are low cost, low power Windows boxes out there. I'm using one. I know there are a lot of even cheaper Linux and Android boxes out there good at streaming media, but I wanted more flexibility and app availability out of a PC connected to my TV. This app also works best with 2 different audio outputs, one for media and one for TTS. Some of those media streaming devices only have one output.

          I use MediaPortal as my HTPC application, which is a Windows app, so my HTPCs are of course Windows machines. One is a dual Xeon box which is my home server and is the MediaPortal Server plus a Client, runs HS3, IP Camera DVR, and a virtual machine. The other is a low-power Minix NGC-1 running Win10 as a MediaPortal client. This app was meant to be a supplement or replacement for speaker.exe on these two machines and not a stand-alone voice control device, so that's why I chose Windows. I wish I knew how to write an app that worked in both.

          Cheers!
          Last edited by mrceolla; June 15, 2017, 09:25 PM.
          HS4, Insteon, Z-wave, USB-UIRT, Harmony Hubs, Google Hub/Chromecasts/Speakers, Foscam & Amcrest cameras, EZVIZ DB1 doorbell
          Plugins: BLLAN, BLOccupied, BLUSBUIRT, Chromecast, Harmony Hub, Insteon, Jon00 Homeseer/Echo Skill Helper, Harmony Hub, Jon00 DB Charting, MediaController, NetCAM, PHLocation2, Pushover 3P, weatherXML, Z-wave

          Comment

          Working...
          X