Skip to main content

Create and adapt content for voice conversations

When you're creating structured flows like greetings, or independently reasoned flows like handoffs, it can be hard to know how to ensure that content is properly paced and configured so it works seamlessly over the phone. Follow the below best practices so your AI Agent delivers digestible amounts of information to callers, and makes them feel understood and empowered.

Use rules to make specialized content available to callers

When you're managing content in your AI Agent like your knowledge articles, Actions or Guidance, you can set availability rules to make that content available to certain customers based on variables associated with them.

We highly recommend using availability rules using the channel variable set to Voice, so you can route your callers automatically to the best experiences possible. This is especially important when you're creating handoffs, to ensure your AI Agent only serves voice-compatible blocks to customers.

Understand basic voice conversation design principles

This section will take you through key concepts that you should understand when writing structured flows for your AI Agent.

Manage the caller's expectations

If you have a long piece of structured content in your AI Agent - one where you have to ask the caller for information three or more times - it can be a good idea to give them an idea of what they're getting into. Here are some examples:

  • Ask for required information in advance

    If your structured content requires information that the caller might not know offhand, like an order or account number, tell them in advance that they'll need it, so they have some more time to get it. For example, you can say, "I'll need to collect your account number and ask you a few more questions."

  • Indicate how long the flow will take

    It's also a good idea to indicate to the caller how long the process will take. This can be as simple as saying, "For most people, this takes two minutes."

Focus on the caller, not the AI Agent

It can be jarring to hear a AI Agent speak as if it's a human agent, referring to itself with the pronouns "I" and "me." To avoid this, keep the wording focused on the caller. For example:

I changed your plan and it should take effect in the next two hours. Would you like me to add a month of free data roaming?

You're all set! Your plan has been changed and you should see the changes in your account within two hours. Would you like a month of free data roaming?

Keep it brief

To keep the conversation moving, remove excess wording so you can convey information to the caller in less time.

You’re all set! Your plan has now been changed and you should see the changes updated in your account within the next two hours. How would you like a month of absolutely free data roaming?

You’re all set! Your plan has now been changed and you should see the changes updated in your account within the next two hours. How Would you like a month of absolutely free data roaming?

Test your structured content for latency

In some of your structured content, you might have blocks, such as the HTTP Request, Set Variable, and Answer Utilities blocks, as well as any Action blocks, that perform backend actions that take some time to complete. Take some time to test those flows for latency. Are there awkward periods of silence while your AI Agent is processing information?

If you find that it feels like it takes too long for your AI Agent to perform those actions, take some time to read some best practices at Minimize pauses while your AI Agent performs backend actions.

Build and adapt content using voice blocks

Are you ready to build voice conversations in your AI Agent? While there are some differences between how messaging and voice content work in Ada, you'll notice that there are a lot of similarities between the two.

Write a greeting for voice

In addition to greeting callers with a friendly and helpful AI Agent persona, make two things clear in your greeting:

  • An AI Agent is speaking, not a human
  • The call is being recorded

Understand how messaging and voice content are organized in the dashboard

When navigating the dashboard, there are a few ways you can look for structured content that have either messaging or voice content that customers can currently interact with.

View messaging and voice versions of structured content

In AI Agents that have Voice enabled, each piece of structured content is divided into two tabs: Messaging and Voice. Under each tab, you can tweak the content for that specific channel.

This system where you have related content split into messaging and voice versions gives you the flexibility to both quickly adapt your existing workflows in messaging to Voice, while also tailoring the customer experience where you need to. You can customize your structured content to have modality-specific content (e.g., in messaging you can use the phrase "please type in your name," and in Voice you can use "please tell me your name" instead).

Learn about how blocks work in voice

Once you're ready to build and adapt your structured content for voice, you should understand that some blocks work differently between messaging and voice.

When you're adapting structured content from one modality to another, it is possible to copy an unsupported block temporarily so you can adapt the content more easily into a supported block. It's important to note that you can't save the structured content if either modality contains unsupported blocks.

Messaging blockVoice block
Different name onlyText messageSpeak
CaptureAsk for
No differences

Shuffle Message

List Option

Conditional

Scheduled

HTTP Request

Zendesk Ticketing

Simple Apps

Messaging only

Zendesk Chat

Salesforce Chat

Not supported
Voice onlyNot supported

Transfer Phone Call

End Phone Call

Use List Option blocks for voice conversations

In voice conversations, time is of the essence. In messaging, you might be able to send customers long lists of options to choose from, but in voice, having a long list of options read out can make callers feel overwhelmed. How can you adapt your existing structured content to make them more voice-friendly?

With List Option blocks, you can allow callers to choose from a list of options.

Adapt List Option blocks for voice conversations

When you're using List Option blocks to allow callers to find their paths through your structured content, there are two additional options you can use:

  • Only read out options if capture is unsuccessful

    When you enable this option, your AI Agent only asks the question (e.g., "Which state are you in?") without then reading out all the options ("Alabama, Alaska, Arizona..."). That way, not only does your AI Agent save the caller some time, but the experience mimics a human conversation. If the caller responds with something unexpected (e.g,. "Ontario"), then your AI Agent will read out the possible options.

  • Options contain a date or time

    Dates and times can be tricky for voice models to parse. This option enables a specialized model for date and times, which captures those values much better than our default model.

Additionally, if you need your AI Agent to read out a long list of options, you can encourage callers to interrupt by saying something like "Call out your account type when you hear it."

Remember, list options lack flexibility; they can only direct callers to the options you give them. Generative AI is a lot more flexible than list option phrases are! To provide an open-ended main menu experience, consider relying on Ada's ability to match caller input to content from your knowledge base wherever possible.

Capture caller information in voice conversations

It can be tricky getting callers' information over the phone. Think about how much easier it would be for a friend to text you a long order number instead of reading it out to you! To set your callers up for success, there are a few different ways you can ask them to give you information in voice.

Learn about how metavariables work in voice

Before we start talking about gathering information from your callers over the phone, it might help to talk about the information Ada collects automatically. In messaging conversations, Ada can collect customer information from their browser. But with voice conversations, there's no browser to collect information with.

As a result, voice conversation metavariables are limited to events that occurred during the conversation, and information that it can gather about the caller's phone number. This isn't a perfect system; for example, it assumes you're in the same location as where your phone number is registered, so if you have a Montreal phone number but are calling from Toronto, it will think you're calling from Montreal.

Here's a comparison of the metavariables between messaging and voice:

MetavariableMessagingVoice
initialurl
introshown
browser
device
browser_version
language
last_question_asked
last_answer_id
user_agent
test_user
phone_number
call_made_to
country
city
country_name
state

Capture additional caller information

When you're capturing information using the Ask block (the voice equivalent of the Capture block in messaging), it's important to think about the most accurate way of getting that information from the caller, and to communicate clearly what you want them to do.

There are four ways you can validate caller information. For all of them, make sure you mention the input you're expecting so the caller knows what to do, along with any other guidelines they might need.

Data typeInput methodsExample prompt
TextSpeakUsing a complete date, such as January 1, 2001, please say your date of birth.
NumberDialpadUsing your dialpad, please enter the 9-digit number after S M, followed by the pound key.
Phone numberDialpadPlease use your dialpad to enter the 10-digit phone number associated with your account, followed by the pound key.
Yes or NoSpeakThe phone number you entered was 636-555-3226. Is that correct? Please say yes or no.

When callers are entering numbers into the dialpad, they can use the # key to indicate that they're done. Otherwise, your AI Agent assumes they're done five seconds after the last digit the caller entered.

Minimize pauses while your AI Agent performs backend actions

Some blocks, such as the HTTP Request and Set Variable blocks, as well as any Action blocks, perform backend actions that take some time to complete. In some cases, this can cause pauses while Ada is performing that action, during which a caller might wonder if your AI Agent had heard the last thing they said or become frustrated.

There are some things your AI Agent does to minimize silence. Approximately two seconds after the caller speaks, if your AI Agent needs more time to process, it plays one of a variety of messages like "okay" or "one moment" to fill the silence. If required, it plays additional similar messages every five seconds, to reassure the caller that the call is still active, until it has a response ready. However, there are some best practices you can follow to further minimize processing interruptions:

  • Immediately before blocks that perform backend actions, place one or more Speak blocks. That way, the backend actions can start processing while your AI Agent reads the content in your Speak blocks, so the caller doesn't have to wait as long to get a response.

  • If you have a Capture block followed by multiple blocks that perform backend actions, put a Speak block directly after the Capture block to give your AI Agent some more time to process.

  • If you can't avoid a long pause (for example, if you know the API call in your HTTP Request block normally takes a long time, or if you have multiple blocks with backend actions in a row), create more of a buffer with longer spoken messages like "I understand that you want to find out the balance for your Platinum Reward credit card. I can help with that."

With these strategies, you can fine-tune your AI Agent make conversations that contain even complicated technical actions feel smooth and effortless for your callers.

Transfer phone calls

With the Transfer Call block, you can transfer the caller to a different phone number or SIP address. Most of the time, this means handing off the call to a human agent.

Note that the call transfer takes place when the handoff gets to the Transfer Phone Call block, so if you have any blocks below it, your AI Agent won't serve them to callers.

  1. On the Ada dashboard, go to AI Agent profile > Handoff.

  2. Open the handoff situation you want to put a call transfer in, then click the Voice tab.

  3. Drag and drop a Transfer Call block into the bottom of the content editor.

  4. Under Phone Number or SIP address, enter the phone number or SIP address, or insert the variable containing the phone number or full SIP address that you want to transfer the call to.

  5. If you entered a SIP address, the Include a User-to-User header when transferring to the SIP address checkbox appears. You can select it to provide additional information with the SIP transfer, so the human agent receiving the call has additional context, or to use when you're routing the call to the appropriate department or group.

    • In the Key fields that appear, enter names for the data fields you create.
    • In the Value fields, insert variables to dynamically insert information about the caller.
    • Click Add another row to add more key,value pairs to the transfer, or hover over a pair and click Delete to remove it.
    note

    The user-to-user header that contains these key,value pairs shouldn't exceed 400 characters.

End phone calls

The End Phone Call block is a simple one - if you put it into an Answer, it disconnects the call. Generally, it's a good idea to let the caller be the one to end the call, but this block can be helpful at the end of your content if you confirm with the caller that they don't need any further assistance.

Note that your AI Agent ends the call when the Answer gets to the End Phone Call block, so if you have any blocks below it in an Answer, the caller won't hear them.

  1. On the Ada dashboard, go to AI Agent profile > Handoff.

  2. Open the handoff situation you want to end the call in, then click the Voice tab.

  3. Drag and drop an End Phone Call block into the bottom of the content editor.

Test your AI Agent's voice content

While building out voice content, it's a good idea to test how your flows will sound to callers. You can test individual pieces of structured content, or call your AI Agent so you can test the entire caller experience, starting at your greeting.

Test how an individual voice block sounds

Anytime a block contains content to read out, the top of the block has a Play icon you can click to hear how its content will sound to callers. You can use this button to double-check that the flow of a block's content sounds good, and if you're using SSML tags to adjust your AI Agent's default sound, you can use it to fine-tune your adjustments. For more information, see Control how your AI Agent pronounces content in voice conversations.

Test the voice content in a piece of structured content

Just like when you're creating text content, you can test a piece of structured content to make sure that your voice content flows well and delivers information in a timely way, regardless of whether it's live yet.

note

Location metavariables depend on the caller initiating the phone call, and don't work if Twilio calls them instead. As a result, if your structured content depends on metavariables being populated, you may have to hardcode them using a Set Variable block so you can test it properly.

  1. At the bottom of the structured content you want to test, click Test answer. A test chat window appears.

  2. At the top of the test chat window, click the Ada Web Chat dropdown and click Voice.

  3. Enter your phone number and click Call. your AI Agent calls you and serves you your structured content. From there, you can hang up, or continue testing your AI Agent.

Test your caller experience

You can test your AI Agent by calling your Twilio number and navigating through your AI Agent starting from the greeting, like a caller would. This is a great way of testing whether your AI Agent is successfully recognizing spoken input and responding with relevant information.

note

Twilio doesn't have any way of knowing whether phone calls are coming from testers or customers. As a result, when you go through your AI Agent's conversations or metrics, you can't easily filter out or exclude test calls. By contrast, when you test individual pieces of structured content, your AI Agent always tags those conversations with Test User.

Additionally, this method only tests content that is currently live.

If you need to find your Twilio number, you can go into your Phone integration settings:

  1. On the Ada dashboard, go to Channels > Voice, then go to the Configuration tab.

  2. Your Twilio account's phone number is in the Main phone number field. Call that phone number to test your AI Agent.

Control how your AI Agent pronounces content in voice conversations

AI Agents are generally pretty good at guessing how to pronounce content, but sometimes you'll want to override the default behaviour to make the content sound better. To do this, you can use Speech Synthesis Markup Language, or SSML. You can view more in-depth documentation at Speech Synthesis Markup Language (SSML) at Google Cloud's documentation, but this topic covers the most common use cases.

Important things to know about SSML

There are some important things you should know about how to use SSML in Ada:

  • You can only use SSML with voices provided by Google; voices provided by ElevenLabs don't support it. If you need the degree of control SSML provides, make sure the voice your AI Agent uses is listed as a Google voice. For more information, see Choose a speaking voice for your AI Agent.

  • SSML uses <speak> tags. If you use SSML in a block, the block needs to start with <speak> and end with </speak>, so the tag contains all of the block's content - not just the content you want to target with the SSML.

  • You can nest tags in each other. For example, if you want your AI Agent to read out an order number slowly, you can use the code <speak>Your order number is <prosody rate="slow"><say-as interpret-as="characters">12569</say-as></prosody>.</speak>. Note that there is still only one <speak> tag that contains all of the block's content.

Anytime a block contains content to read out, the top of the block has a Play icon you can click to test out your content, so you can fine tune the adjustments you make to the SSML.

Pronounce characters individually

By default, AI Agents usually try to pronounce content together as full words or numbers as opposed to reading out the individual characters. To make your AI Agent pronounce them separately, you can either space out the characters (e.g., "A S A P"), or use <speak><say-as interpret-as="characters">CONTENT_HERE</say-as></speak>.

You can also use this technique with numbers. For example, if a caller places an order, it's much easier for your AI Agent to read out the order number, 12569, as separate numbers, as opposed to "twelve thousand, five hundred sixty nine."

Add pauses

Sometimes you might want to add pauses to avoid overwhelming callers with information. In this case, you can use <speak><break time="Xs"/></speak>, where X is the number of seconds you want the break to last. This can be useful in cases like:

  • Separating steps or details in a list

  • Giving callers a chance to gather information (e.g., "Let me give you 10 seconds to find your order number")

  • Giving callers a chance to read and respond to an SMS you sent them

Change the AI Agent's reading speed

You can get your AI Agent to speed up or slow down; for example, it might be helpful to slow down the reading speed when reading out a long account number, or speed up through a disclaimer. To do this, use <speak><prosody rate="slow">CONTENT_HERE</prosody></speak>.

You can set the prosody rate attribute a few ways:

  • As slow, medium, or fast

  • Using numbers, where 1 is the normal speed

    • Slow down using .8 or .9; much slower than this can be offputting

    • Speed up using 1.1 or 1.2

note

Changing the speed for only certain words in a sentence might cause unwanted pauses. If you try this, make sure you test it. If it sounds odd, change the speed of the entire sentence rather than just a portion of it.

Add emphasis

When you’ve got something important to say, your AI Agent's voice can add emphasis. Use the code <speak><emphasis level="strong">EMPHASIZED-CONTENT</emphasis></speak>.

Play a sound clip

If you want to add a sound file to an Answer, use this code: <speak><audio src="URL_PATH_TO_SOUND_FILE">FALLBACK_CONTENT</audio></speak>.

The above code requires both a direct path to a sound file, and a fallback phrase to read out in case the sound file doesn't load.

Additionally, you can add attributes to play only a portion of the audio. To do this, add the clipBegin and/or clipEnd attributes to the audio tag, each with the timestamp where you want to begin or end the sound clip, like this: <speak><audio src="URL_PATH_TO_SOUND_FILE" clipBegin="Xs" clipEnd="Ys">FALLBACK_CONTENT</audio></speak>.

Lastly, here's an example of what happens if the sound file fails to load. Instead of playing the sound, your AI Agent reads your fallback content instead.