Create and adapt content for voice conversations
When you're creating structured flows like greetings, or independently reasoned flows like handoffs, it can be hard to know how to ensure that content is properly paced and configured so it works seamlessly over the phone. Follow the below best practices so your AI Agent delivers digestible amounts of information to callers, and makes them feel understood and empowered.
Use rules to make specialized content available to callers
When you're managing content in your AI Agent like your knowledge articles, Actions or Guidance, you can set availability rules to make that content available to certain customers based on variables associated with them.
We highly recommend using availability rules using the channel
variable set to Voice
, so you can route your callers automatically to the best experiences possible. This is especially important when you're creating handoffs, to ensure your AI Agent only serves voice-compatible blocks to customers.
Understand basic voice conversation design principles
This section will take you through key concepts that you should understand when writing structured flows for your AI Agent.
Manage the caller's expectations
If you have a long piece of structured content in your AI Agent - one where you have to ask the caller for information three or more times - it can be a good idea to give them an idea of what they're getting into. Here are some examples:
-
Ask for required information in advance
If your structured content requires information that the caller might not know offhand, like an order or account number, tell them in advance that they'll need it, so they have some more time to get it. For example, you can say, "I'll need to collect your account number and ask you a few more questions."
-
Indicate how long the flow will take
It's also a good idea to indicate to the caller how long the process will take. This can be as simple as saying, "For most people, this takes two minutes."
Focus on the caller, not the AI Agent
It can be jarring to hear a AI Agent speak as if it's a human agent, referring to itself with the pronouns "I" and "me." To avoid this, keep the wording focused on the caller. For example:
I changed your plan and it should take effect in the next two hours. Would you like me to add a month of free data roaming?
You're all set! Your plan has been changed and you should see the changes in your account within two hours. Would you like a month of free data roaming?
Keep it brief
To keep the conversation moving, remove excess wording so you can convey information to the caller in less time.
You’re all set! Your plan has now been changed and you should see the changes updated in your account within the next two hours. How would you like a month of absolutely free data roaming?
You’re all set! Your plan has now been changed and you should see the
changes updated in your account within the next two hours. How Would you
like a month of absolutely free data roaming?
Test your structured content for latency
In some of your structured content, you might have blocks, such as the HTTP Request, Set Variable, and Answer Utilities blocks, as well as any Action blocks, that perform backend actions that take some time to complete. Take some time to test those flows for latency. Are there awkward periods of silence while your AI Agent is processing information?
If you find that it feels like it takes too long for your AI Agent to perform those actions, take some time to read some best practices at Minimize pauses while your AI Agent performs backend actions.
Build and adapt content using voice blocks
Are you ready to build voice conversations in your AI Agent? While there are some differences between how messaging and voice content work in Ada, you'll notice that there are a lot of similarities between the two.
Write a greeting for voice
In addition to greeting callers with a friendly and helpful AI Agent persona, make two things clear in your greeting:
- An AI Agent is speaking, not a human
- The call is being recorded
Understand how messaging and voice content are organized in the dashboard
When navigating the dashboard, there are a few ways you can look for structured content that have either messaging or voice content that customers can currently interact with.
View messaging and voice versions of structured content
In AI Agents that have Voice enabled, each piece of structured content is divided into two tabs: Messaging and Voice. Under each tab, you can tweak the content for that specific channel.
This system where you have related content split into messaging and voice versions gives you the flexibility to both quickly adapt your existing workflows in messaging to Voice, while also tailoring the customer experience where you need to. You can customize your structured content to have modality-specific content (e.g., in messaging you can use the phrase "please type in your name," and in Voice you can use "please tell me your name" instead).
Learn about how blocks work in voice
Once you're ready to build and adapt your structured content for voice, you should understand that some blocks work differently between messaging and voice.
When you're adapting structured content from one modality to another, it is possible to copy an unsupported block temporarily so you can adapt the content more easily into a supported block. It's important to note that you can't save the structured content if either modality contains unsupported blocks.
Messaging block | Voice block | |
---|---|---|
Different name only | Text message | Speak |
Capture | Ask for | |
No differences | Shuffle Message List Option Conditional Scheduled HTTP Request Zendesk Ticketing Simple Apps | |
Messaging only | Zendesk Chat Salesforce Chat | Not supported |
Voice only | Not supported | Transfer Phone Call End Phone Call |
Use List Option blocks for voice conversations
In voice conversations, time is of the essence. In messaging, you might be able to send customers long lists of options to choose from, but in voice, having a long list of options read out can make callers feel overwhelmed. How can you adapt your existing structured content to make them more voice-friendly?
With List Option blocks, you can allow callers to choose from a list of options.
Adapt List Option blocks for voice conversations
When you're using List Option blocks to allow callers to find their paths through your structured content, there are two additional options you can use:
-
Only read out options if capture is unsuccessful
When you enable this option, your AI Agent only asks the question (e.g., "Which state are you in?") without then reading out all the options ("Alabama, Alaska, Arizona..."). That way, not only does your AI Agent save the caller some time, but the experience mimics a human conversation. If the caller responds with something unexpected (e.g,. "Ontario"), then your AI Agent will read out the possible options.
-
Options contain a date or time
Dates and times can be tricky for voice models to parse. This option enables a specialized model for date and times, which captures those values much better than our default model.
Additionally, if you need your AI Agent to read out a long list of options, you can encourage callers to interrupt by saying something like "Call out your account type when you hear it."
Remember, list options lack flexibility; they can only direct callers to the options you give them. Generative AI is a lot more flexible than list option phrases are! To provide an open-ended main menu experience, consider relying on Ada's ability to match caller input to content from your knowledge base wherever possible.
Capture caller information in voice conversations
It can be tricky getting callers' information over the phone. Think about how much easier it would be for a friend to text you a long order number instead of reading it out to you! To set your callers up for success, there are a few different ways you can ask them to give you information in voice.
Learn about how metavariables work in voice
Before we start talking about gathering information from your callers over the phone, it might help to talk about the information Ada collects automatically. In messaging conversations, Ada can collect customer information from their browser. But with voice conversations, there's no browser to collect information with.
As a result, voice conversation metavariables are limited to events that occurred during the conversation, and information that it can gather about the caller's phone number. This isn't a perfect system; for example, it assumes you're in the same location as where your phone number is registered, so if you have a Montreal phone number but are calling from Toronto, it will think you're calling from Montreal.
Here's a comparison of the metavariables between messaging and voice:
Metavariable | Messaging | Voice |
---|---|---|
initialurl | ✔ | ✘ |
introshown | ✔ | ✘ |
browser | ✔ | ✘ |
device | ✔ | ✘ |
browser_version | ✔ | ✘ |
language | ✔ | ✔ |
last_question_asked | ✔ | ✔ |
last_answer_id | ✔ | ✔ |
user_agent | ✔ | ✔ |
test_user | ✔ | ✔ |
phone_number | ✘ | ✔ |
call_made_to | ✘ | ✔ |
country | ✘ | ✔ |
city | ✘ | ✔ |
country_name | ✘ | ✔ |
state | ✘ | ✔ |
Capture additional caller information
When you're capturing information using the Ask block (the voice equivalent of the Capture block in messaging), it's important to think about the most accurate way of getting that information from the caller, and to communicate clearly what you want them to do.
There are four ways you can validate caller information. For all of them, make sure you mention the input you're expecting so the caller knows what to do, along with any other guidelines they might need.
Data type | Input methods | Example prompt |
---|---|---|
Text | Speak | Using a complete date, such as January 1, 2001, please say your date of birth. |
Number | Dialpad | Using your dialpad, please enter the 9-digit number after S M, followed by the pound key. |
Phone number | Dialpad | Please use your dialpad to enter the 10-digit phone number associated with your account, followed by the pound key. |
Yes or No | Speak | The phone number you entered was 636-555-3226. Is that correct? Please say yes or no. |
When callers are entering numbers into the dialpad, they can use the #
key to indicate that they're done. Otherwise, your AI Agent assumes they're
done five seconds after the last digit the caller entered.
Minimize pauses while your AI Agent performs backend actions
Some blocks, such as the HTTP Request and Set Variable blocks, as well as any Action blocks, perform backend actions that take some time to complete. In some cases, this can cause pauses while Ada is performing that action, during which a caller might wonder if your AI Agent had heard the last thing they said or become frustrated.
There are some things your AI Agent does to minimize silence. Approximately two seconds after the caller speaks, if your AI Agent needs more time to process, it plays one of a variety of messages like "okay" or "one moment" to fill the silence. If required, it plays additional similar messages every five seconds, to reassure the caller that the call is still active, until it has a response ready. However, there are some best practices you can follow to further minimize processing interruptions:
-
Immediately before blocks that perform backend actions, place one or more Speak blocks. That way, the backend actions can start processing while your AI Agent reads the content in your Speak blocks, so the caller doesn't have to wait as long to get a response.
-
If you have a Capture block followed by multiple blocks that perform backend actions, put a Speak block directly after the Capture block to give your AI Agent some more time to process.
-
If you can't avoid a long pause (for example, if you know the API call in your HTTP Request block normally takes a long time, or if you have multiple blocks with backend actions in a row), create more of a buffer with longer spoken messages like "I understand that you want to find out the balance for your Platinum Reward credit card. I can help with that."
With these strategies, you can fine-tune your AI Agent make conversations that contain even complicated technical actions feel smooth and effortless for your callers.
Transfer phone calls
With the Transfer Call block, you can transfer the caller to a different phone number or SIP address. Most of the time, this means handing off the call to a human agent.
Note that the call transfer takes place when the handoff gets to the Transfer Phone Call block, so if you have any blocks below it, your AI Agent won't serve them to callers.
-
On the Ada dashboard, go to AI Agent profile > Handoff.
-
Open the handoff situation you want to put a call transfer in, then click the Voice tab.
-
Drag and drop a Transfer Call block into the bottom of the content editor.
-
Under Phone Number or SIP address, enter the phone number or SIP address, or insert the variable containing the phone number or full SIP address that you want to transfer the call to.
-
If you entered a SIP address, the Include a User-to-User header when transferring to the SIP address checkbox appears. You can select it to provide additional information with the SIP transfer, so the human agent receiving the call has additional context, or to use when you're routing the call to the appropriate department or group.
- In the Key fields that appear, enter names for the data fields you create.
- In the Value fields, insert variables to dynamically insert information about the caller.
- Click Add another row to add more key,value pairs to the transfer, or hover over a pair and click Delete to remove it.
noteThe user-to-user header that contains these key,value pairs shouldn't exceed 400 characters.
End phone calls
The End Phone Call block is a simple one - if you put it into an Answer, it disconnects the call. Generally, it's a good idea to let the caller be the one to end the call, but this block can be helpful at the end of your content if you confirm with the caller that they don't need any further assistance.
Note that your AI Agent ends the call when the Answer gets to the End Phone Call block, so if you have any blocks below it in an Answer, the caller won't hear them.
-
On the Ada dashboard, go to AI Agent profile > Handoff.
-
Open the handoff situation you want to end the call in, then click the Voice tab.
-
Drag and drop an End Phone Call block into the bottom of the content editor.
Test your AI Agent's voice content
While building out voice content, it's a good idea to test how your flows will sound to callers. You can test individual pieces of structured content, or call your AI Agent so you can test the entire caller experience, starting at your greeting.
Test how an individual voice block sounds
Anytime a block contains content to read out, the top of the block has a Play icon you can click to hear how its content will sound to callers. You can use this button to double-check that the flow of a block's content sounds good, and if you're using SSML tags to adjust your AI Agent's default sound, you can use it to fine-tune your adjustments. For more information, see Control how your AI Agent pronounces content in voice conversations.
Test the voice content in a piece of structured content
Just like when you're creating text content, you can test a piece of structured content to make sure that your voice content flows well and delivers information in a timely way, regardless of whether it's live yet.
Location metavariables depend on the caller initiating the phone call, and don't work if Twilio calls them instead. As a result, if your structured content depends on metavariables being populated, you may have to hardcode them using a Set Variable block so you can test it properly.
-
At the bottom of the structured content you want to test, click Test answer. A test chat window appears.
-
At the top of the test chat window, click the Ada Web Chat dropdown and click Voice.
-
Enter your phone number and click Call. your AI Agent calls you and serves you your structured content. From there, you can hang up, or continue testing your AI Agent.
Test your caller experience
You can test your AI Agent by calling your Twilio number and navigating through your AI Agent starting from the greeting, like a caller would. This is a great way of testing whether your AI Agent is successfully recognizing spoken input and responding with relevant information.
Twilio doesn't have any way of knowing whether phone calls are coming from testers or customers. As a result, when you go through your AI Agent's conversations or metrics, you can't easily filter out or exclude test calls. By contrast, when you test individual pieces of structured content, your AI Agent always tags those conversations with Test User.
Additionally, this method only tests content that is currently live.
If you need to find your Twilio number, you can go into your Phone integration settings:
-
On the Ada dashboard, go to Channels > Voice, then go to the Configuration tab.
-
Your Twilio account's phone number is in the Main phone number field. Call that phone number to test your AI Agent.
Control how your AI Agent pronounces content in voice conversations
AI Agents are generally pretty good at guessing how to pronounce content, but sometimes you'll want to override the default behaviour to make the content sound better. To do this, you can use Speech Synthesis Markup Language, or SSML. You can view more in-depth documentation at Speech Synthesis Markup Language (SSML) at Google Cloud's documentation, but this topic covers the most common use cases.
Important things to know about SSML
There are some important things you should know about how to use SSML in Ada:
-
You can only use SSML with voices provided by Google; voices provided by ElevenLabs don't support it. If you need the degree of control SSML provides, make sure the voice your AI Agent uses is listed as a Google voice. For more information, see Choose a speaking voice for your AI Agent.
-
SSML uses
<speak>
tags. If you use SSML in a block, the block needs to start with<speak>
and end with</speak>
, so the tag contains all of the block's content - not just the content you want to target with the SSML. -
You can nest tags in each other. For example, if you want your AI Agent to read out an order number slowly, you can use the code
<speak>Your order number is <prosody rate="slow"><say-as interpret-as="characters">12569</say-as></prosody>.</speak>
. Note that there is still only one<speak>
tag that contains all of the block's content.
Anytime a block contains content to read out, the top of the block has a Play icon you can click to test out your content, so you can fine tune the adjustments you make to the SSML.
Pronounce characters individually
By default, AI Agents usually try to pronounce content together as full words
or numbers as opposed to reading out the individual characters. To make
your AI Agent pronounce them separately, you can either space out the
characters (e.g., "A S A P"), or use
<speak><say-as interpret-as="characters">CONTENT_HERE</say-as></speak>
.
You can also use this technique with numbers. For example, if a caller places an order, it's much easier for your AI Agent to read out the order number, 12569, as separate numbers, as opposed to "twelve thousand, five hundred sixty nine."
Add pauses
Sometimes you might want to add pauses to avoid overwhelming callers
with information. In this case, you can use
<speak><break time="Xs"/></speak>
, where X
is the number of seconds
you want the break to last. This can be useful in cases like:
-
Separating steps or details in a list
-
Giving callers a chance to gather information (e.g., "Let me give you 10 seconds to find your order number")
-
Giving callers a chance to read and respond to an SMS you sent them
Change the AI Agent's reading speed
You can get your AI Agent to speed up or slow down; for example, it might be
helpful to slow down the reading speed when reading out a long account
number, or speed up through a disclaimer. To do this, use
<speak><prosody rate="slow">CONTENT_HERE</prosody></speak>
.
You can set the prosody rate
attribute a few ways:
-
As
slow
,medium
, orfast
-
Using numbers, where
1
is the normal speed-
Slow down using
.8
or.9
; much slower than this can be offputting -
Speed up using
1.1
or1.2
-
Changing the speed for only certain words in a sentence might cause unwanted pauses. If you try this, make sure you test it. If it sounds odd, change the speed of the entire sentence rather than just a portion of it.
Add emphasis
When you’ve got something important to say, your AI Agent's voice can add
emphasis. Use the code
<speak><emphasis level="strong">EMPHASIZED-CONTENT</emphasis></speak>
.
Play a sound clip
If you want to add a sound file to an Answer, use this code:
<speak><audio src="URL_PATH_TO_SOUND_FILE">FALLBACK_CONTENT</audio></speak>
.
The above code requires both a direct path to a sound file, and a fallback phrase to read out in case the sound file doesn't load.
Additionally, you can add attributes to play only a portion of the
audio. To do this, add the clipBegin
and/or clipEnd
attributes to
the audio tag, each with the timestamp where you want to begin or end
the sound clip, like this:
<speak><audio src="URL_PATH_TO_SOUND_FILE" clipBegin="Xs" clipEnd="Ys">FALLBACK_CONTENT</audio></speak>
.
Lastly, here's an example of what happens if the sound file fails to load. Instead of playing the sound, your AI Agent reads your fallback content instead.