
No announcement yet.

Caching Speech routines using Amazon Polly

  • Filter
  • Time
  • Show
Clear All
new posts

  • Guest
    Guest replied
    Hi timon,

    i tried with squeezeboox plugin. It create two files... a temp one.. and something convert this in .wav. ( i can hear the tts if i play the wav file...) Then the squeezebox plugin try to convert it to mp3... and then nothing. The speakers plays a weird sound and i can mannual play the mp3... since i hear the same weird sound.

    Then i tried with airplayspeak plugin. And i got this in the log :
     [TABLE="cellspacing: 0"]
    [TD="align: left"][COLOR=#FF0000]Sep-05 11:53:47 PM[/COLOR][/TD]
     			[TD="align: left"] [/TD]
     			[TD="colspan: 3, align: left"][COLOR=#FF0000]AirplaySpeak[/COLOR][/TD]
     			[TD="colspan: 8, align: left"][COLOR=#FF0000]ERROR Not a WAVE file - no RIFF header[/COLOR][/TD]
    [TABLE="cellspacing: 0"]
    [TD="align: left"][COLOR=#000000]Sep-05 11:53:47 PM[/COLOR][/TD]
     			[TD="align: left"] [/TD]
     			[TD="colspan: 3, align: left"][COLOR=#000000]TTS[/COLOR][/TD]
     			[TD="colspan: 8, align: left"][COLOR=#000000]Speak: ():hello this is amazon polly[/COLOR][/TD]
    [TABLE="cellspacing: 0"]
    [TD="align: left"][COLOR=#000000]Sep-05 11:53:47 PM[/COLOR][/TD]
     			[TD="align: left"] [/TD]
     			[TD="colspan: 3, align: left"][COLOR=#000000]AirplaySpeak[/COLOR][/TD]
     			[TD="colspan: 8, align: left"][COLOR=#000000]INFO (RaspiSalon,raspiBedroom,raspiHallway): hello this is amazon polly[/COLOR][/TD]
    Last edited by ; September 5, 2018, 11:26 PM.

    Leave a comment:

  • Timon
    All of the actual TTS is done in the cloud. PollyCs job is to send the text to the cloud then cache what some back so the next time you want to use the exact same text it doesn’t have to go to the cloud and convert it again. If you want to start over you just clear the cache directory.

    Leave a comment:

  • Rvtravlr
    We use both AirPlay Speak and Sonos, with Dirk’s PI as our primary output. We also use Cepstral Allison for the voice. Your script, while exciting, seems a little complicated for us. Is it possible to use the cached tts (we don’t want to rely on cloud base) with our set up? And, if it is, is there a for idiots step by step?

    Leave a comment:

  • emiliosic
    Originally posted by 838Joel View Post
    I'm impressed, but now has a newbie, I need to integrate voice to my HS... Since I don't have a speaker on my server (using VM) I need to run a program in a computer that has a speaker...

    Unless I can broadcast this to my Sonos system...

    Anyway, I'm still in a big learning curve and so much to learn from integrating devices, then I need to figured out how to use events and now adding some speech to my stuff..

    I hope to catch up fast and play with all those fantastic stuff.

    Good work!

    Sent from my Nexus 6P using Tapatalk
    The Sonos Plugin for HS3 supports TTS. I'm also running HS3 on a VM, and the Sonos Plugin works for basic TTS. I have yet to incorporate this.

    Leave a comment:

  • Guest
    Guest replied
    Thanks for the input Timon, actually I have a raspberry Pi somewhere I can use...
    I'll look into it [emoji106]

    Sent from my Nexus 6P using Tapatalk

    Leave a comment:

  • Timon
    Originally posted by Tillsy View Post
    What you've done is amazing!

    The Mac lets you pipe Siri's voice to an audio file, so I simply pre-recorded (e.g. cached) what I needed and I play them when announcements are required. They're prefixed with a Star Trek TNG computer sound (alert tone for warnings, paging tone for notices).
    I was doing somewhat the same using Amazons Polly to create static audio for prompts. It was just not worth the effort to keep downloading sounds to my HS system. That's way I had been looking at doing a caching Polly handler.

    Originally posted by 838Joel View Post
    I'm impressed, but now has a newbie, I need to integrate voice to my HS... Since I don't have a speaker on my server (using VM) I need to run a program in a computer that has a speaker...

    Unless I can broadcast this to my Sonos system...

    Anyway, I'm still in a big learning curve and so much to learn from integrating devices, then I need to figured out how to use events and now adding some speech to my stuff..

    I hope to catch up fast and play with all those fantastic stuff.

    Good work!

    Sent from my Nexus 6P using Tapatalk
    You should be able to use Spuds AirplaySpeaker then use one of several Airplay packages that run on the RaspberryPi. I'm going to be using one which will run on the RaspberryPi Zero W along with a speaker pHAT that's the same time.

    Here are some packages you can look at. All of them will do Airplay with the right software loaded.
    Pimoronii SpeakerHat
    Pimoroni Pirate Radio
    Adafruit Speaker Bonnet

    There are other protocols that run on the Pi that have code for HS that should also run. Basically any remote speaker that you can send mp3 files to will work with running Caching Polly.

    BTW, I would like to again thank DeLicious for finding out how to access Amazon Polly. I've been thinking about making Cashing Polly process for a while and those few lines of code allowed got me to speed up the project.

    Leave a comment:

  • Guest
    Guest replied
    I'm impressed, but now has a newbie, I need to integrate voice to my HS... Since I don't have a speaker on my server (using VM) I need to run a program in a computer that has a speaker...

    Unless I can broadcast this to my Sonos system...

    Anyway, I'm still in a big learning curve and so much to learn from integrating devices, then I need to figured out how to use events and now adding some speech to my stuff..

    I hope to catch up fast and play with all those fantastic stuff.

    Good work!

    Sent from my Nexus 6P using Tapatalk

    Leave a comment:

  • Tillsy
    What you've done is amazing!

    The Mac lets you pipe Siri's voice to an audio file, so I simply pre-recorded (e.g. cached) what I needed and I play them when announcements are required. They're prefixed with a Star Trek TNG computer sound (alert tone for warnings, paging tone for notices).

    Leave a comment:

  • Timon
    Samples of what Amazon's Polly sound like

    I'm surprised that no one's tried this yet. For those of you that haven't heard the quality of speech that Amazon's Polly produces here are a couple of samples from my test event along with the strings sent to Polly to create them. These were done using the Joanna voice.

    The first sample uses a simple text string. The second sample show what you can do with a string formatted using Speech Synthesis Markup Language (SSML). If you want to checkout the different tags go to the Amazon Polly SSML page.

    Hello World! This is a text string.

    <speak><prosody volume="+9dB"><emphasis level="moderate">Attention</emphasis></prosody> <break time='300ms'/> Hello World. This is a <say-as interpret-as="spell-out">ssml</say-as> string.</speak>

    The following was created using the Amazon Polly web page and the Matthew voice. It could just as well be done using PollyC via HomeSeer.

    This is my original voice, without any modifications. <amazon:effect vocal-tract-length="+15%"> Now, imagine that I am much bigger. </amazon:effect> <amazon:effect vocal-tract-length="-15%">
    Or, perhaps you prefer my voice when I'm very small? </amazon:effect> You can also control the
    timbre of my voice by making more minor adjustments. <amazon:effect vocal-tract-length="+10%"> For example, by making me sound just a little bigger. </amazon:effect> <amazon:effect vocal-tract-length="-10%"> Or instead, making me sound only somewhat smaller. </amazon:effect>

    I think you can see just how much better the speech is compared to the what flite produces but how well you can control just how it sounds.

    Leave a comment:

  • Timon
    PollyC version 0.9.1 has been released. See: first post for more information.

    Updates include creation of cache directory if it does not exist.
    Caching will only be preformed if cacheing directory is specified.
    Additional error handling.
    -h or --help is now included for easier use.

    Leave a comment:

  • Timon
    PollyC.py3 version 0.9.0 has been released. See the first post for more information.

    The next release will allow you to specify the voice to use in the string. For now the default voice "Joanna" is used unless you change it on the command line.

    Remember you must manually create the cache file, default name is pollycache, in the HomeSeer directory. PollyC.py3 should also go in the HomeSeer directory.

    This has been fun to write and fun to be able to finally get good speech from HS3. Enjoy!

    Leave a comment:

  • Timon
    Well, PollyC.py3 is working.

    Currently you have manually create to the ~/HomeSeer/pollycache directory and pass the pointer to it on the PollyC command line. I'll get that fixed in the second beta.

    PollyC.py3 requires Python3 and the of course the boto3 module. I'll try to get a Python2 version, and PollyC.py3 both out tomorrow.

    This has been fun, my voice responses sound so much better than those created by the flite TTS.

    OMT, This has NOT been tested under Windows Python and I don't have a Windows system to test it on. If someone wants to try it that's great and if it doesn't work I will work with you to try and get it to work.

    Also, anytime you want to clear the cache all you have to do is clear out all the files in the pollycache directory and the cache will be recreated.
    Last edited by Timon; June 24, 2018, 02:19 AM.

    Leave a comment:

  • Timon
    started a topic Caching Speech routines using Amazon Polly

    Caching Speech routines using Amazon Polly

    I've been using Amazon Polly to create speech file manually but it's been a pain. And I knew I needed something better.

    Thanks to some great initial work by DeLicious and his program I'm now releasing a fully caching version called PollyC. It's available for both Python2 and Python3 versions. Use whichever version of Python you normally use. For RaspberryPi users Python2 is loaded by default so it's easiest to go with

    PollyC will cache all speech requests so it only has to go to Amazon's Polly servers when the speech isn't in the local cache.

    It will also take advantage of ssml marker language so you can create much better TTS than you can with straight text.

    It will work with HS speaker clients and should work with Spuds AirplaySpeak although it's not yet been tested.

    It's be releasing this as Open Source with the only requirement being to keep the credits for DeLicious and me in the flies.

    Since it's written in Python is should be portable to both windows and linux.

    If there are any feature request or comments please leave them below.

    --------------------------------- Usage Information ---------------------------------
    This modual is used to call the Amazon Polly system to convert an incomming string
    to a audio file.
    PollyC must be located in the HomeSeer directory
    Calling sequence
      ./scripts/PollyC.py3 -o "outupt_file" -t "the text to speak" -c "./pollycache/" -k "key_ID" -a Key"
      -o or --ofile           Output file name
      -t or --text            Text to speak
      -v or --voiceid         The Polly voice to use (default = Joanna) 
      -f or --format   		Output format (default - mp3)
      -c or --cache           cache directory, full, relative path or none
                                  If no cache is specified then cacheing is disabled
      -k or --keyid           Amazon AWS Access Key ID, mandatory
      -a or --accesskey       Amazon AWS Access Key, mandatory
      -r or --region          Amazon Region (defaults to us-west-1) 
    PollyC will auto switch to ssml if it detects the string "<speak>" in the text to be converted.
    For instructions on how to encode ssml speach see
    Future additions to PollyC
    Select voice on call
    There is no provision in Amazon Polly to set the voice in the text string. However 
    PollyC has the ability to do this: If you specify at the beginning of the text 
    string, either plain text or ssml text, <voice-id="voice name"> then that voice will be 
    used and the tag deleted from the string. This tag MUST be at the beginning of the string.
    Example:  '<voice-id="Matthew">This is a test'
              '<voice-id="Matthew"><speak>This is a test</speak>'
    Don't cache this call.
    Currently PollyC will cache only if a caching directory is specified. This addition 
    will allow each call to be cached or not.
    Usage: '<no-cache/>This is a test'
           '<no-cache/><speak>This is a test</speak>'
    This program is free to use and distribute as long as the credits are not removed.
    Here is my file. Since aplay can't play mp3's you need an mp3 player such as mpg123
    # For Python2 change ./PollyC.py3 to ./
    ./PollyC.py3 -o "temp.mp3" -t "$1" -c "./pollycache/" -k "your key_id" -a "your key"
    mpg123 -q temp.mp3
    Here is my file.
    # For Python2 change ./PollyC.py3 to ./
    ./PollyC.py3 -o "$1" -t "$2" -c "./pollycache/" -k "your key_id" -a "your key"
    A little note about why there are two speak routines. The module is used only to speak through the local systems audio channel. The module is used whenever any remote system such as HS speaker clients or Spuds AirplaySpeak is used. If both are used both will be called which would then make 2 requests to Amazon Polly if the cacheing was not in place.

    If the cacheing directory is specified but it does not exist it will be created.

    PollyC.py3 and should also go in the HomeSeer directory.

    Release Status
    0.9.0 Thu 21 Jun 06:45:51 PDT 2018
    Initial Release
    0.9.1 Sat 23 Jun 16:49:59 PDT 2018
    Updates include creation of cache directory if it does not exist.
    Caching will only be preformed if cacheing directory is specified.
    Additional error handling.
    -h or --help is now included for easier use.
    Attached Files
    Last edited by Timon; June 24, 2018, 10:25 AM. Reason: Released PollyC