Skip to main content

Control how your bot pronounces content in voice conversations

Bots are generally pretty good at guessing how to pronounce content, but sometimes you'll want to override the default behaviour to make the content sound better. To do this, you can use Speech Synthesis Markup Language, or SSML. You can view more in-depth documentation at Speech Synthesis Markup Language (SSML) at Google Cloud's documentation, but this topic covers the most common use cases.

Important things to know about SSML

There are some important things you should know about how to use SSML in Ada:

  • You can only use SSML with voices provided by Google; voices provided by OpenAI don't support it. If you need the degree of control SSML provides, make sure the voice your bot uses is listed as a Google voice. For more information, see Choose a speaking voice for your bot.

    OpenAI voices are generally better at guessing how to pronounce content, but can sometimes struggle pronouncing content like long or complicated numbers. If you have terms that these voices don't pronounce as expected, consider spelling those terms out phonetically.

  • SSML uses <speak> tags. If you use SSML in a block, the block needs to start with <speak> and end with </speak>, so the tag contains all of the block's content - not just the content you want to target with the SSML.

  • You can nest tags in each other. For example, if you want your bot to read out an order number slowly, you can use the code <speak>Your order number is <prosody rate="slow"><say-as interpret-as="characters">12569</say-as></prosody>.</speak>. Note that there is still only one <speak> tag that contains all of the block's content.

Anytime a block contains content to read out, the top of the block has a Play icon you can click to test out your content, so you can fine tune the adjustments you make to the SSML.

Pronounce characters individually

By default, bots usually try to pronounce content together as full words or numbers as opposed to reading out the individual characters. To make the bot pronounce them separately, you can either space out the characters (e.g., "A S A P"), or use <speak><say-as interpret-as="characters">CONTENT_HERE</say-as></speak>.

You can also use this technique with numbers. For example, if a caller places an order, it's much easier for the bot to read out the order number, 12569, as separate numbers, as opposed to "twelve thousand, five hundred sixty nine."

Add pauses

Sometimes you might want to add pauses to avoid overwhelming callers with information. In this case, you can use <speak><break time="Xs"/></speak>, where X is the number of seconds you want the break to last. This can be useful in cases like:

  • Separating steps or details in a list

  • Giving callers a chance to gather information (e.g., "Let me give you 10 seconds to find your order number")

  • Giving callers a chance to read and respond to an SMS you sent them

Change the bot's reading speed

You can get the bot to speed up or slow down; for example, it might be helpful to slow down the reading speed when reading out a long account number, or speed up through a disclaimer. To do this, use <speak><prosody rate="slow">CONTENT_HERE</prosody></speak>.

You can set the prosody rate attribute a few ways:

  • As slow, medium, or fast

  • Using numbers, where 1 is the normal speed

    • Slow down using .8 or .9; much slower than this can be offputting

    • Speed up using 1.1 or 1.2


Changing the speed for only certain words in a sentence might cause unwanted pauses. If you try this, make sure you test it. If it sounds odd, change the speed of the entire sentence rather than just a portion of it.

Add emphasis

When you’ve got something important to say, your bot's voice can add emphasis. Use the code <speak><emphasis level="strong">EMPHASIZED-CONTENT</emphasis></speak>.

Play a sound clip

If you want to add a sound file to an Answer, use this code: <speak><audio src="URL_PATH_TO_SOUND_FILE">FALLBACK_CONTENT</audio></speak>.

The above code requires both a direct path to a sound file, and a fallback phrase to read out in case the sound file doesn't load.

Additionally, you can add attributes to play only a portion of the audio. To do this, add the clipBegin and/or clipEnd attributes to the audio tag, each with the timestamp where you want to begin or end the sound clip, like this: <speak><audio src="URL_PATH_TO_SOUND_FILE" clipBegin="Xs" clipEnd="Ys">FALLBACK_CONTENT</audio></speak>.

Lastly, here's an example of what happens if the sound file fails to load. Instead of playing the sound, the bot reads your fallback content instead.

Have any questions? Contact your Ada team—or email us at .