Announcement

Collapse
No announcement yet.

Caching Speech routines using Amazon Polly

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Caching Speech routines using Amazon Polly

    I've been using Amazon Polly to create speech file manually but it's been a pain. And I knew I needed something better.

    Thanks to some great initial work by DeLicious and his Polly.py program I'm now releasing a fully caching version called PollyC. It's available for both Python2 and Python3 versions. Use whichever version of Python you normally use. For RaspberryPi users Python2 is loaded by default so it's easiest to go with PollyC.py.

    PollyC will cache all speech requests so it only has to go to Amazon's Polly servers when the speech isn't in the local cache.

    It will also take advantage of ssml marker language so you can create much better TTS than you can with straight text.

    It will work with HS speaker clients and should work with Spuds AirplaySpeak although it's not yet been tested.

    It's be releasing this as Open Source with the only requirement being to keep the credits for DeLicious and me in the flies.

    Since it's written in Python is should be portable to both windows and linux.

    If there are any feature request or comments please leave them below.

    Code:
    --------------------------------- Usage Information ---------------------------------
    PollyC
    This modual is used to call the Amazon Polly system to convert an incomming string
    to a audio file.
    PollyC must be located in the HomeSeer directory
    Calling sequence
      ./scripts/PollyC.py3 -o "outupt_file" -t "the text to speak" -c "./pollycache/" -k "key_ID" -a Key"
    
    arguments
      -o or --ofile           Output file name
      -t or --text            Text to speak
      -v or --voiceid         The Polly voice to use (default = Joanna) 
                                  see https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
      -f or --format   		Output format (default - mp3)
      -c or --cache           cache directory, full, relative path or none
                                  If no cache is specified then cacheing is disabled
      -k or --keyid           Amazon AWS Access Key ID, mandatory
      -a or --accesskey       Amazon AWS Access Key, mandatory
      -r or --region          Amazon Region (defaults to us-west-1) 
                                  see https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html
    PollyC will auto switch to ssml if it detects the string "<speak>" in the text to be converted.
    
    For instructions on how to encode ssml speach see https://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html
    
    
    Future additions to PollyC
    Select voice on call
    There is no provision in Amazon Polly to set the voice in the text string. However 
    PollyC has the ability to do this: If you specify at the beginning of the text 
    string, either plain text or ssml text, <voice-id="voice name"> then that voice will be 
    used and the tag deleted from the string. This tag MUST be at the beginning of the string.
    Example:  '<voice-id="Matthew">This is a test'
              '<voice-id="Matthew"><speak>This is a test</speak>'
    
    Don't cache this call.
    Currently PollyC will cache only if a caching directory is specified. This addition 
    will allow each call to be cached or not.
    Usage: '<no-cache/>This is a test'
           '<no-cache/><speak>This is a test</speak>'
    
    This program is free to use and distribute as long as the credits are not removed.
    ---------------------------------------------------------------------------------------
    Here is my speak.sh file. Since aplay can't play mp3's you need an mp3 player such as mpg123
    Code:
    #!/bin/sh
    # For Python2 change ./PollyC.py3 to ./PollyC.py
    ./PollyC.py3 -o "temp.mp3" -t "$1" -c "./pollycache/" -k "your key_id" -a "your key"
    mpg123 -q temp.mp3
    Here is my speak_to_file.sh file.
    Code:
    #!/bin/sh
    # For Python2 change ./PollyC.py3 to ./PollyC.py
    ./PollyC.py3 -o "$1" -t "$2" -c "./pollycache/" -k "your key_id" -a "your key"
    A little note about why there are two speak routines. The module speak.sh is used only to speak through the local systems audio channel. The module speak_to_file.sh is used whenever any remote system such as HS speaker clients or Spuds AirplaySpeak is used. If both are used both will be called which would then make 2 requests to Amazon Polly if the cacheing was not in place.

    If the cacheing directory is specified but it does not exist it will be created.

    PollyC.py3 and PollyC.py should also go in the HomeSeer directory.


    Release Status
    0.9.0 Thu 21 Jun 06:45:51 PDT 2018
    Initial Release
    0.9.1 Sat 23 Jun 16:49:59 PDT 2018
    Updates include creation of cache directory if it does not exist.
    Caching will only be preformed if cacheing directory is specified.
    Additional error handling.
    -h or --help is now included for easier use.
    Attached Files
    Last edited by Timon; June 24, 2018, 10:25 AM. Reason: Released PollyC
    HomeSeer Version: HS3 Standard Edition 3.0.0.548
    Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    Number of Devices: 484 | Number of Events: 776

    Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
    3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
    4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
    3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

    Z-Net version: 1.0.23 for Inclusion Nodes
    SmartStick+: 6.04 (ZDK 6.81.3) on Server

    #2
    Well, PollyC.py3 is working.

    Currently you have manually create to the ~/HomeSeer/pollycache directory and pass the pointer to it on the PollyC command line. I'll get that fixed in the second beta.

    PollyC.py3 requires Python3 and the of course the boto3 module. I'll try to get a Python2 version, PollyC.py and PollyC.py3 both out tomorrow.

    This has been fun, my voice responses sound so much better than those created by the flite TTS.

    OMT, This has NOT been tested under Windows Python and I don't have a Windows system to test it on. If someone wants to try it that's great and if it doesn't work I will work with you to try and get it to work.

    Also, anytime you want to clear the cache all you have to do is clear out all the files in the pollycache directory and the cache will be recreated.
    Last edited by Timon; June 24, 2018, 02:19 AM.
    HomeSeer Version: HS3 Standard Edition 3.0.0.548
    Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    Number of Devices: 484 | Number of Events: 776

    Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
    3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
    4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
    3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

    Z-Net version: 1.0.23 for Inclusion Nodes
    SmartStick+: 6.04 (ZDK 6.81.3) on Server

    Comment


      #3
      PollyC.py3 version 0.9.0 has been released. See the first post for more information.

      The next release will allow you to specify the voice to use in the string. For now the default voice "Joanna" is used unless you change it on the command line.

      Remember you must manually create the cache file, default name is pollycache, in the HomeSeer directory. PollyC.py3 should also go in the HomeSeer directory.

      This has been fun to write and fun to be able to finally get good speech from HS3. Enjoy!
      HomeSeer Version: HS3 Standard Edition 3.0.0.548
      Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
      Number of Devices: 484 | Number of Events: 776

      Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
      3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
      4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
      3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

      Z-Net version: 1.0.23 for Inclusion Nodes
      SmartStick+: 6.04 (ZDK 6.81.3) on Server

      Comment


        #4
        PollyC version 0.9.1 has been released. See: first post for more information.

        Updates include creation of cache directory if it does not exist.
        Caching will only be preformed if cacheing directory is specified.
        Additional error handling.
        -h or --help is now included for easier use.
        HomeSeer Version: HS3 Standard Edition 3.0.0.548
        Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
        Number of Devices: 484 | Number of Events: 776

        Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
        3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
        4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
        3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

        Z-Net version: 1.0.23 for Inclusion Nodes
        SmartStick+: 6.04 (ZDK 6.81.3) on Server

        Comment


          #5
          Samples of what Amazon's Polly sound like

          I'm surprised that no one's tried this yet. For those of you that haven't heard the quality of speech that Amazon's Polly produces here are a couple of samples from my test event along with the strings sent to Polly to create them. These were done using the Joanna voice.

          The first sample uses a simple text string. The second sample show what you can do with a string formatted using Speech Synthesis Markup Language (SSML). If you want to checkout the different tags go to the Amazon Polly SSML page.

          Hello World! This is a text string.

          <speak><prosody volume="+9dB"><emphasis level="moderate">Attention</emphasis></prosody> <break time='300ms'/> Hello World. This is a <say-as interpret-as="spell-out">ssml</say-as> string.</speak>

          The following was created using the Amazon Polly web page and the Matthew voice. It could just as well be done using PollyC via HomeSeer.

          <speak>
          This is my original voice, without any modifications. <amazon:effect vocal-tract-length="+15%"> Now, imagine that I am much bigger. </amazon:effect> <amazon:effect vocal-tract-length="-15%">
          Or, perhaps you prefer my voice when I'm very small? </amazon:effect> You can also control the
          timbre of my voice by making more minor adjustments. <amazon:effect vocal-tract-length="+10%"> For example, by making me sound just a little bigger. </amazon:effect> <amazon:effect vocal-tract-length="-10%"> Or instead, making me sound only somewhat smaller. </amazon:effect>
          </speak>


          I think you can see just how much better the speech is compared to the what flite produces but how well you can control just how it sounds.
          HomeSeer Version: HS3 Standard Edition 3.0.0.548
          Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
          Number of Devices: 484 | Number of Events: 776

          Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
          3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
          4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
          3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

          Z-Net version: 1.0.23 for Inclusion Nodes
          SmartStick+: 6.04 (ZDK 6.81.3) on Server

          Comment


            #6
            What you've done is amazing!

            The Mac lets you pipe Siri's voice to an audio file, so I simply pre-recorded (e.g. cached) what I needed and I play them when announcements are required. They're prefixed with a Star Trek TNG computer sound (alert tone for warnings, paging tone for notices).

            Comment


              #7
              I'm impressed, but now has a newbie, I need to integrate voice to my HS... Since I don't have a speaker on my server (using VM) I need to run a program in a computer that has a speaker...


              Unless I can broadcast this to my Sonos system...

              Anyway, I'm still in a big learning curve and so much to learn from integrating devices, then I need to figured out how to use events and now adding some speech to my stuff..

              I hope to catch up fast and play with all those fantastic stuff.

              Good work!
              Joel

              Sent from my Nexus 6P using Tapatalk

              Comment


                #8
                Originally posted by Tillsy View Post
                What you've done is amazing!

                The Mac lets you pipe Siri's voice to an audio file, so I simply pre-recorded (e.g. cached) what I needed and I play them when announcements are required. They're prefixed with a Star Trek TNG computer sound (alert tone for warnings, paging tone for notices).
                I was doing somewhat the same using Amazons Polly to create static audio for prompts. It was just not worth the effort to keep downloading sounds to my HS system. That's way I had been looking at doing a caching Polly handler.

                Originally posted by 838Joel View Post
                I'm impressed, but now has a newbie, I need to integrate voice to my HS... Since I don't have a speaker on my server (using VM) I need to run a program in a computer that has a speaker...


                Unless I can broadcast this to my Sonos system...

                Anyway, I'm still in a big learning curve and so much to learn from integrating devices, then I need to figured out how to use events and now adding some speech to my stuff..

                I hope to catch up fast and play with all those fantastic stuff.

                Good work!
                Joel

                Sent from my Nexus 6P using Tapatalk
                You should be able to use Spuds AirplaySpeaker then use one of several Airplay packages that run on the RaspberryPi. I'm going to be using one which will run on the RaspberryPi Zero W along with a speaker pHAT that's the same time.

                Here are some packages you can look at. All of them will do Airplay with the right software loaded.
                Pimoronii SpeakerHat
                Pimoroni Pirate Radio
                Adafruit Speaker Bonnet

                There are other protocols that run on the Pi that have code for HS that should also run. Basically any remote speaker that you can send mp3 files to will work with running Caching Polly.

                BTW, I would like to again thank DeLicious for finding out how to access Amazon Polly. I've been thinking about making Cashing Polly process for a while and those few lines of code allowed got me to speed up the project.
                HomeSeer Version: HS3 Standard Edition 3.0.0.548
                Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
                Number of Devices: 484 | Number of Events: 776

                Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
                3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
                4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
                3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

                Z-Net version: 1.0.23 for Inclusion Nodes
                SmartStick+: 6.04 (ZDK 6.81.3) on Server

                Comment


                  #9
                  Thanks for the input Timon, actually I have a raspberry Pi somewhere I can use...
                  I'll look into it [emoji106]

                  Sent from my Nexus 6P using Tapatalk

                  Comment


                    #10
                    Originally posted by 838Joel View Post
                    I'm impressed, but now has a newbie, I need to integrate voice to my HS... Since I don't have a speaker on my server (using VM) I need to run a program in a computer that has a speaker...


                    Unless I can broadcast this to my Sonos system...

                    Anyway, I'm still in a big learning curve and so much to learn from integrating devices, then I need to figured out how to use events and now adding some speech to my stuff..

                    I hope to catch up fast and play with all those fantastic stuff.

                    Good work!
                    Joel

                    Sent from my Nexus 6P using Tapatalk
                    The Sonos Plugin for HS3 supports TTS. I'm also running HS3 on a VM, and the Sonos Plugin works for basic TTS. I have yet to incorporate this.

                    Comment


                      #11
                      We use both AirPlay Speak and Sonos, with Dirk’s PI as our primary output. We also use Cepstral Allison for the voice. Your script, while exciting, seems a little complicated for us. Is it possible to use the cached tts (we don’t want to rely on cloud base) with our set up? And, if it is, is there a for idiots step by step?
                      Michael

                      Comment


                        #12
                        All of the actual TTS is done in the cloud. PollyCs job is to send the text to the cloud then cache what some back so the next time you want to use the exact same text it doesn’t have to go to the cloud and convert it again. If you want to start over you just clear the cache directory.
                        HomeSeer Version: HS3 Standard Edition 3.0.0.548
                        Linux version: Linux auto 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
                        Number of Devices: 484 | Number of Events: 776

                        Enabled Plug-Ins: 3.0.0.13: AirplaySpeak | 2.0.61.0: BLBackup
                        3.0.0.70: EasyTrigger | 1.3.7006.42100: LiftMaster MyQ
                        4.2.3.0: mcsMQTT | 3.0.0.53: PHLocation2 | 0.0.0.47: Pushover 3P
                        3.0.0.16: RaspberryIO | 3.0.1.262: Z-Wave

                        Z-Net version: 1.0.23 for Inclusion Nodes
                        SmartStick+: 6.04 (ZDK 6.81.3) on Server

                        Comment


                          #13
                          Hi timon,

                          i tried with squeezeboox plugin. It create two files... a temp one.. and something convert this in .wav. ( i can hear the tts if i play the wav file...) Then the squeezebox plugin try to convert it to mp3... and then nothing. The speakers plays a weird sound and i can mannual play the mp3... since i hear the same weird sound.


                          Then i tried with airplayspeak plugin. And i got this in the log :
                          Code:
                           [TABLE="cellspacing: 0"]
                          [TR]
                          [TD="align: left"][COLOR=#FF0000]Sep-05 11:53:47 PM[/COLOR][/TD]
                           			[TD="align: left"] [/TD]
                           			[TD="colspan: 3, align: left"][COLOR=#FF0000]AirplaySpeak[/COLOR][/TD]
                           			[TD="colspan: 8, align: left"][COLOR=#FF0000]ERROR Not a WAVE file - no RIFF header[/COLOR][/TD]
                           		[/TR]
                          [/TABLE]
                          [TABLE="cellspacing: 0"]
                          [TR]
                          [TD="align: left"][COLOR=#000000]Sep-05 11:53:47 PM[/COLOR][/TD]
                           			[TD="align: left"] [/TD]
                           			[TD="colspan: 3, align: left"][COLOR=#000000]TTS[/COLOR][/TD]
                           			[TD="colspan: 8, align: left"][COLOR=#000000]Speak: ():hello this is amazon polly[/COLOR][/TD]
                           		[/TR]
                          [/TABLE]
                          [TABLE="cellspacing: 0"]
                          [TR]
                          [TD="align: left"][COLOR=#000000]Sep-05 11:53:47 PM[/COLOR][/TD]
                           			[TD="align: left"] [/TD]
                           			[TD="colspan: 3, align: left"][COLOR=#000000]AirplaySpeak[/COLOR][/TD]
                           			[TD="colspan: 8, align: left"][COLOR=#000000]INFO (RaspiSalon,raspiBedroom,raspiHallway): hello this is amazon polly[/COLOR][/TD]
                           		[/TR]
                          [/TABLE]
                          Last edited by ; September 5, 2018, 11:26 PM.

                          Comment


                            #14
                            I cheked the speaktofile from spud and adapted it :

                            #!/bin/sh
                            ./PollyC.py -o "$1_t" -t "$2" -c "./pollycache/" -k "xxxxxxxxxxxxxxxxxx" -a "xxxxxxxxxxxxxxxxxxxxxxxx"
                            ffmpeg -i "$1_t" -y -ar 44100 "$1"
                            rm "$1_t"


                            this seems to do the tricks... But i would like to be able to tts with squeezebox plugin .
                            pcp any idea?

                            Comment


                              #15
                              Thanks for working on this and sharing it!

                              I'm trying to get this to work with Homeseer Pro 3.0.0.435 running on Windows 10. I'm 99% there.
                              • Tweaked the PollyC.py3 script so it returns an mp3 file
                              • Verified that it is caching in the pollycache folder
                              • Found a DOS command line app called "cmdmp3" to play the .mp3 file
                                • May not need cmdmp3 because I can enter the path to the mp3 file for a speak action and speaker.exe plays it locally or remotely via Hs3Touch on my iPhone
                                • Click image for larger version  Name:	Screen Shot 2018-09-08 at 3.19.25 PM.png Views:	1 Size:	28.7 KB ID:	1246369
                              • Created a batch file named speak.bat that calls PollyC.py3 and then plays the mp3 file via cmdmp3 (If I call it from inside a DOS window)
                              Code:
                              Python PollyC.py3 -o "temp.mp3" -t %1
                              cmdmp3 "temp.mp3"


                              How do I get HS to run speak.bat and pass it the string when I call, hs.speak "Speak this text, please" instead of calling speaker.exe directly? I used to use speak.sh on my Homeseer RPi but can't find any way to get HS to use a speak script on Windows. Ideally, I'd rather the mp3 play via speaker.exe, but if I can at least get the speak.bat file to run and play via cmdmp3, I'll be happy.

                              Thanks in advance!
                              Last edited by RandyInLA; September 8, 2018, 06:10 PM.

                              Comment

                              Working...
                              X