Frequently Asked Questions

What does it cost? Does free mean I am the product?

You pay as you go. There is a trial period in which you can generate 100 000 characters of text for free. After that, you will have to pay 15 USD for each 100 000 additional characters. This is about 33 standard pages of text or 10 minutes of audio.

I don't want to do my own SSML/TTS, can you do it for me?

Voxabot provides the following production services:

  1. Multilingual post production services: A native speaker of the target language listens to the generated audio files and corrects the errors
  2. We customize SSML scripts for creating customized audio files for different purposes.
  3. Just about anything with TTS--we are curious and open minded about pushing the boundaries of this technology.

Please contact us for more information about our production and post-production TTS services.

Also if the traditional voice over works better, we have competitive rates for professional human voice over.

Do you have a tutorial?

Sure, you can find it here:

What about things that aren't standard SSML, can I add custom tags?

Yes you can!

Although the SSML standard covers virtually all aspects of Text-To-Speech, there are some areas which the SSML standard does not specify, and each TTS engine and voice can have their one custom SSML tags. This is particularly true with the newest neural voices which include additional SSML elements for controlling the voice output. For example, Microsoft Azure has some voices which support tags for expressing emotions like cheerfulness, empathy, and calm, or optimize the voice for different scenarios like customer service, and voice assistant.

To add a custom tag do the following:

  1. Click on Tools > Custom
  2. Depending on the structure of the tag you may not be able to insert the custom code--also note that some TTS engines have limitations [voices, regions, etc.] where the tag can be applied.

Can I export or import my SSML?

Yes, you can!

Import SSML

To import SSML code into Voxabot editor follow these steps:

  1. Click on the Tools tab to open the dropdown menu
  2. In the dropdown menu click on Import SSML. This will open the Import SSML code window.
  3. Paste the SSML code that you want to import and click OK.

Export SSML

To export SSML code from the Voxabot editor follow these steps:

  1. Click on the Tools tab to open the dropdown menu
  2. In the dropdown menu click on Export SSML. This will create and download a plain text file with the SSML file extension encoded in UTF8.

What is SSML?

SSML stands for Speech Synthesis Markup Language Specification. SSML is the standard markup language for generating synthetic speech. The TTS engines make their default interpretation of the written texts, and you can use SSML to customize a series of elements to control aspects of speech such as pronunciation, volume, pitch, rate, etc.
The Voxabot SSML Editor helps you to insert the SSML codes automatically in the text script and hear the generated audio immediately.

For more information read the Wikipedia article:

How can I get help?

Contact us here:

What TTS engine do you use?

We have connections to Google, AWS Polly, and Microsoft Azure. Click on the links below for information from that cloud provider. Note that they are constantly evolving their offerings and we are constantly adding their new features to our service.

Do you Support AWS Polly Neural TTS?

Unfortunately, Amazon Polly limits the use of NTTS to specific voices, regions, and languages as well as limiting the allowed tags--this prevents us from implementing it faithfully. Azure and Google’s Neural TTS are available instead.

What are the differences between standard voices (TTS) and neural voices (NTTS)?

Generally speaking, standard voices have been created using speech synthesis technology available before 2016 which included multiple stages, such as a text analysis frontend, an acoustic model, and an audio synthesis module. Neural voices have been created using similar technologies but make use of neural networks and deep learning technologies (Wavenet, Tacotron, VoiceLoop) which are faster to produce and deliver a more human-like sound.

Benefits of Text-to-Speech

  • Fast: you can record your message immediately without any special recording equipment
  • Consistent: same voice always available for consistency with other previously recorded materials.
  • Available 24x7: you don’t have to rely on the availability of a voice artist.
  • Inexpensive: recording 100 000 characters (roughly 33 pages of typed text) only costs 15 USD.

How do I create an audio file from written text?

These are the steps you should follow for creating an mp3 audio file from written text using the TTS engine default values:

  1. Sign inVoxabotwith your Google account
  2. Paste the text that you want to convert to audio in the Editor pane.
  3. In the Language tab, select the language in which the text is written
  4. In the Voices tab, select one of the voices available
  5. Press the Convert to Audio button to generate the audio file.
  6. Press the Play button to hear the audio. If you don’t like the voice you can repeat the steps above to generate another audio clip using a different connector, language, and/or voice.
  7. Press the Download button to download the mp3 audio file.  

You can see how it works here:App.Voxabot Introduction - YouTube

What is TTS or Text-To-Speech?

Text-To-Speech, abbreviated TTS, is the artificial production of human speech from written texts. Currently Voxabot Editor generates human-like voices using the TTS engines from Amazon, Microsoft and Google. This means that you have a great choice of standard voices and state-of-the-art neural voices to suit your needs in many languages.