New TTS engines

ericonvera · January 21, 2014, 6:52pm

[quote=“lolodomo, post:20, topic:178940”]What about the Microsoft translator service ? A HTTP API exists with a speak feature ? Any idea of the audio quality?
The service is free.[/quote]

Audio quality is decent. Certainly better than most free TTS engines. It looks like the same type of method we use for Google or the OSX one should work. The URL structure is as follows:

http://api.microsofttranslator.com/V2/Http.svc/Translate?

text=the text to speek&
from=en&
to=en&
contentType=textPlain&
appId=123

there’s some more information about the method here → [url=http://msdn.microsoft.com/en-us/library/ff512421.aspx]Microsoft Learn: Build skills that open doors in your career

that appId is something you need to get from Microsoft. There are instructions to do so here → [url=http://msdn.microsoft.com/en-us/library/hh454950.aspx]Microsoft Learn: Build skills that open doors in your career

SM2k · January 22, 2014, 4:09pm

Sorry for the delay. It took me some time to locate the file. This code is horribly direct and was designed to simply mimic the pieces of Google’s translate server that were used by the Sonos plugin. One note, I chose to use the Samantha voice (the new voice of Siri) as I think it sounds much better than other options. Depending on how new your OS is, you might need to download that voice: Add the Voice of Siri to Mac OS X

Obviously to open port 80, you’ll need to execute this script as root, which isn’t a terrific idea of course–but this was for testing. One trick I used to test this without modification to the Sonos plugin code was to add a local IP for translate.google.com to vera’s /etc/hosts file. I wouldn’t recommend that for anything but testing of course.

Final thought: you’ll need lame installed on your mac as well. I used macports for that, but installers for OS X appear to be readily available.

#!/usr/bin/python
from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer
import os
import urlparse
import tempfile
import subprocess
import shutil

PORT_NUMBER = 80

#http://translate.google.com/translate_tts?tl=%s&amp;q=%s
class voiceProxy(BaseHTTPRequestHandler):

    #Handler for the GET requests
    def do_GET(self):
        parts = urlparse.urlparse(self.path)
        text = urlparse.parse_qs(parts.query).get('q', [''])[0]
        tmpDir = tempfile.mkdtemp()
        try:
            aiff_fn = os.path.join(tmpDir, 'translate_tts.aiff')
            mp3_fn = os.path.join(tmpDir, 'translate_tts.mp3')

            # create an aiff file of the submitted text
            p = subprocess.Popen(['say', '-v', 'Samantha', '-o', aiff_fn],
                stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
            p.stdin.write(text)
            out, err = p.communicate()
            status = p.wait()

            # translate the file to mp3
            p = subprocess.Popen(['lame', '-h', '-m', 'm', '-b', '64', aiff_fn, mp3_fn],
                stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
            out, err = p.communicate()
            status = p.wait()

            # send it back...
            f = open(mp3_fn)
            self.send_response(200)
            self.send_header('Content-type', "audio/mpeg")
            self.end_headers()
            self.wfile.write(f.read())
            f.close()
        finally:
            shutil.rmtree(tmpDir, ignore_errors = True)

try:
    #Create a web server and define the handler to manage the
    #incoming request
    server = HTTPServer(('', PORT_NUMBER), voiceProxy)
    print 'Started httpserver on port ' , PORT_NUMBER

    #Wait forever for incoming http requests
    server.serve_forever()

except KeyboardInterrupt:
    print '^C received, shutting down the web server'
    server.socket.close()

lolodomo · January 23, 2014, 11:55am

@ericonvera: shall I keep the lang parameter in the URL ? This parameter is not mentioned here: http://wolfpaulus.com/jounal/mac/ttsserver/

Here are my plans:
1 - move all the TTS stuff, including multiple engines, in a library (for easy re-use in the DLNA plugin)
2 - add a new “engine” parameter to the Say action (to let user select its engine) + add a new variable to define the default engine
3 - move UI for TTS (setup and playback) in a new tab

I will try to finish (commit) work relative to points 1 and 2 today or tomorrow.
Point 3 will be managed later.

ericonvera · January 23, 2014, 1:38pm

You can remove it. I had put it in there in hopes of modifying the TTSServer to accept it and choose a voice based on language but that certainly hasn’t happened yet.

I’m glad to hear that this is making it into the general release soon.

SM2k · January 23, 2014, 4:00pm

I recall there being something like a 100 character limit inside the Sonos plugin (well the plugin breaks speech into chunks that large). I think that might have been imposed by Google. I think other TTS options don’t necessarily impose a limit (I know the simple engine I posted doesn’t). It would be nice to expose if and how many characters each engine chunks text into, because I artificially pad spaces into larger messages I send to the Sonos plugin so that longer messages don’t oddly pause mid-sentence.

ericonvera · January 23, 2014, 4:10pm

Correct. Google has a limit of 100 characters. The other service I tested doesn’t have a limit so the plugin doesn’t break it into chunks. There could probably be some smarter logic for Google that would use punctuation to break up long messages instead of just spaces like it does now.

Sent from my SCH-I545 using Tapatalk

SM2k · January 23, 2014, 4:29pm

Ah! If the plugin were able to break on punctuation then I wouldn’t need to artificially pad large messages, nor would I even need to know about how each engine chunks text. That approach would work even better.

ericonvera · January 23, 2014, 6:18pm

Ah! If the plugin were able to break on punctuation then I wouldn’t need to artificially pad large messages, nor would I even need to know about how each engine chunks text. That approach would work even better.[/quote]

@lolodomo, this should be as simple as the following:

local pos = string.find(string.reverse(string.sub(remaining, 1, cutSize+1)), ".")
if (pos == nil) then
  pos = string.find(string.reverse(string.sub(remaining, 1, cutSize+1)), ",")
  if (pos == nil) then
    pos = string.find(string.reverse(string.sub(remaining, 1, cutSize+1)), " ")
  end
end

This way it first looks for a period, then a comma, then a space if it can’t find either. This would go just before the line reading

if (pos ~= nil) then

AgileHumor · January 23, 2014, 7:56pm

I don’t really use a TTS engine dynamically. Instead, I use mControl to play static Audio Files when certain luup devices are on/off/armed/tripped. mControl running on Windows is nice that it also integrates Vera and Media Center.

Downside is the price and having to duplicate some logic in both places…as well as the price.

I create and download the WAV files here:
http://www2.research.att.com/~ttsweb/tts/demo.php

SM2k · January 23, 2014, 8:35pm

Legacy apps - AssistiveWare I haven’t downloaded any of these voices, but some of them sound fairly realistic. I think they integrate directly with OS X system voices (per their claim that they’re system wide). If that’s true they should work with any of the OS X TTS solutions that have been discussed. Looks like voices cost 20 to 30 bucks each and you can trial them for a month.

lolodomo · January 23, 2014, 11:37pm

@ericonvera : I have committed (in the trunk), it is a first step, improvments are possible.
It only covers the point 1 I mentioned earlier in the day.

New variables:

DefaultEngineTTS: use either GOOGLE or OSX_TTS_SERVER
OSXTTSServerURL: set the URL of your personal TTS server, something like http://myserver.org:12345

It is working with the two engines.

ericonvera · January 24, 2014, 12:04am

@lolodomo that is great news

Sent from my SCH-I545 using Tapatalk

SM2k · January 25, 2014, 8:47pm

[quote=“lolodomo, post:31, topic:178940”]@ericonvera : I have committed (in the trunk), it is a first step, improvments are possible.
It only covers the point 1 I mentioned earlier in the day.

New variables:

DefaultEngineTTS: use either GOOGLE or OSX_TTS_SERVER
OSXTTSServerURL: set the URL of your personal TTS server, something like http://myserver.org:12345

It is working with the two engines.[/quote]

I’ve glanced at trunk, and there’s a lot more files than there used to be.

If I wanted to smoke-jump and install from trunk, I assume I’ll need everything in the services directory in addition to roughly what the wiki says for beta 2. It looks like S_SonosAVTransport1.xml was renamed to S_AVTransport1.xml and likewise for S_SonosGroupRenderingControl1.xml → S_RenderingControl1.xml, correct?

Could I safely remove the beta 2 files after uploading everything from trunk?

SM2k · January 25, 2014, 9:13pm

Nevermind! I realized the comments on the files in the services directory were along the lines of “hide stuff that isn’t ready/doesn’t need to be dealt with”. I went ahead and installed the files and have text to speech coming from my mac now.

allmoney.ws · January 31, 2014, 11:37pm

[quote=“lolodomo, post:4, topic:178940”][quote=“flyveleder, post:2, topic:178940”]One of the best engines I have come across is this one : http://demo.acapela-group.com/

Sounds perfect in my language (Danish)[/quote]

Oh yes, not too bad in French too 8)[/quote]
Russian voice better that Google TTS

flyveleder · February 12, 2014, 11:24am

Is anyone working on how to get Acapela TTS working with the Sonos Plugin ?

lolodomo · February 12, 2014, 1:36pm

If you can provide a lua function that returns the URL of the produced (local) audio file and duration (in seconds) with text and language as input parameters, I will add it with pleasure to the plugin and more generally to the TTS library.

First you need to know what kind of API is available. If HTTP API is available, it should be doable relatively easily.

flyveleder · February 12, 2014, 2:00pm

I can provide you with exactly nothing I don’t have a clue about lua programming or know if acapela provides with public API.

My question was merely if someone was looking into it; Otherwise I will lower my expectations

(Google TTS is doable - but far from perfect).

Thanks,
Martin