- Introduction
- Step 1: Create an Amazon Web Services (AWS) Account
- Step 2: Create an Identity and Access Management (IAM) User
- Step 3: Create the Project
- Step 4: Add a “Prompt Playback” Component
- Step 5: Build and Deploy to 3CX Phone System
- Final Notes
- See Also
Introduction
Many times we need to reproduce audio that we can’t have pre-recorded. It could be a name, a place, some task description that we get from a database, just to name a few. In these cases Text to Speech (TTS) comes to the rescue, letting us create WAV files on the fly, so the CFD app can play them back to the caller.
The 3CX Call Flow Designerincludes the Text to Speech Audio Prompt, to be used when configuring prompts, like the Prompt Playback component, the Menu component, the User Input component, and so on.
The CFD app converts text to speech in real time, just before playing the message to the caller. It invokes a web service to get the audio stream, and saves it to a local WAV file. Finally, when the call ends, the WAV files are automatically removed, always keeping the installation clean.
To use TTS, you can use the engine provided by Amazon Web Services or Google Cloud. In this guide we will explain how to set up a CFD app using Amazon Web Services. To use TTS with Google Cloud, please refer to thisguide.
Amazon Pollyis extremely good in quality, has a very wide language coverage, many voices, and a very affordable price, also including a totally free tier during the first year of use. You need 3CX Phone System version 16 Update 6or later to be able to use this feature.
This guide describes how to create the Amazon Web Services account, how to enable Amazon Polly, and how to use it with the Text to Speech Audio Prompt to play a dynamically generated audio stream.
💡 Tip: The project for this example application is available via the CFD DemosGitHub page, and is installed along with the 3CX Call Flow Designer in your Windows user documents folder, i.e. “C:\Users\YourUsername\Documents\3CX Call Flow Designer Demos”.
Step 1: Create an Amazon Web Services (AWS) Account
Before we start working on our CFD project, we need an Amazon Web Services account. To create it, please follow this guidefrom Amazon.
Step 2: Create an Identity and Access Management (IAM) User
Once we have our AWS account, we need to create an IAM user. The CFD application uses this user’s credentials to access the Amazon Web Services. Please follow this guidefrom Amazon to do this. When asked, set the access type to “Programmatic access”. When configuring permissions, select “Attach existing policies to user directly”, search for “AmazonPollyFullAccess” and check it.
After creating the IAM user, go to the user’s settings, click on Security credentials, and then click on “Create access key”. Take note of the “Access key ID” and the “Secret access key”. This information is required when configuring your CFD project to use TTS.
⚠Important: Please be aware of the Amazon Polly limits. These limitations normally do not cause any issues with CFD projects.
Step 3: Create the Project
Now that we have our Amazon Web Services account ready to work with Amazon Polly, we can create our Call Flow Designer project. Open the CFD and go to “File” > “New” > “Project”, select the folder where you want to save it, and enter a name for the project, e.g. “TextToSpeechDemo”.
Now go to the “Tools” > “Online Services” menu, and then select “Text to Speech” to show the settings we need to configure for TTS to work:
- “Online Service”: select Amazon Polly
- “Client ID”: this is the “Access key ID” that we generated in Step 2.
- “Client Secret”: this is the “Secret access key” that we generated in Step 2.
- “Region”: select the closest region to your location, to reduce latency. Available regions for Amazon Polly are listed here. Some features, like neural voices, are only available in specific regions. Check the limitsfor more information.
- “Lexicons”: if you need to use lexicons, specify the name of the lexicon that you have already uploaded to the AWS Console.
📄 Note: It is important to select the region where the lexicon has been deployed. If specifying more than one lexicon, enter each one in a new line.
The settings entered here are used for every Text To Speech Audio Prompt in this project.
Step 4: Add a “Prompt Playback” Component
Usually we use TTS to dynamically generate audio from data retrieved from data sources, like a database, or a web service. But in this case, for the sake of simplicity, we are creating the text to convert to speech concatenating static text and a callflow variable. So, we define a callflow variable named “AccountBalance” and set the value to 100, so we are able to play a message like: “Your account balance is $100”.
To add the “Prompt Playback”component:
- Drag a “Prompt Playback”component from the toolbox, and drop it into the design view of the “Main” callflow. Then select the component added, go to the “Properties” and rename it to “playPrompt”.
- From the “Properties”, open the “Prompt Collection Editor”, clicking the button on the right of the “Prompts” property.
- Click “Add” to add a new prompt to the collection, and change the type to “Text to Speech Audio Prompt”.
- Select the Voice to use. The drop down list of voices is ordered by language, so you can easily find the options available for the language you need to use. The voices available for Amazon Polly are listed here. In case of Amazon releasing a new standard voice not included yet in this drop down list, you can just enter the value from the “Name/ID” column to use it. Note that manually entering new neural voices is not supported. If you want a specific voice to be pre-filled, you can set it from “Tools” > “Options” > “Component Templates” > “Text To Speech”. For this demo we use “Joanna (English - US, Female)”.
- Select the Type of text - the options are “Text” and “SSML” (Speech Synthesis Markup Language). We use “Text” in this example, as typically used. When you select the type Text, the value of the following property Text is considered as plain text, and the TTS engine tries to convert it to speech just as it is. If you select the “SSML” type, the value of the Text property is considered XML according to the SSML specification. With SSML you can control various aspects of speech such as pronunciation, volume, pitch, and speech rate. For more information, see Using SSML.
- Enter an expression for the Text. Depending on the type selected in the previous step, the expression must return plain text to convert to speech, or XML according to the SSML specification. For this demo we use the expression:
CONCATENATE("Your account balance is $",callflow$.AccountBalance)
Step 5: Build and Deploy to 3CX Phone System
The project is ready to build and upload to our 3CX Phone System server, with these steps:
- Select “Build” > “Build All” and the CFD generates the file “TextToSpeechDemo.zip”.
- Go to the “3CX Management Console” > “Advanced” > “Call Flow Apps” > “Add/Update”, and upload the file created by the CFD in the previous step.
- The Call Flow app is ready to use. Make a call to itto test this app. Please note that the very first time you call this application, the text to speech conversion might have a delay of a few seconds. This is related to the authentication procedure, and only happens the first time you call the app.
Final Notes
Usually a project requires some static prompts, for example to welcome your users or offer an options menu, and some variable prompts, like playing the caller’s account balance. You probably want to use the TTS service for variable prompts only, to avoid overpaying to convert always the same text to speech. But also, you may want to have the same voice for all your prompts. So it is recommended that you create WAV files for the static prompts, using the Amazon Polly console, download them as WAV files to your project, and use these files as a standard Audio File Prompt, instead of converting them from text to speech for every call.
So, from the Amazon Polly console, select your language and region, then the voice to use, enter the text of your prompt and press “Download MP3”. Please note that 3CX requires files to be in WAV format, Mono, 8.000Hz, 16 bits per sample. After downloading the MP3 files, use this guideto convert the files to the proper format.
See Also
- Learn more about CFD components.
- Automated Telephone Ordering Voice app with CRM integration via the 3CX API.
- Sending emails from a CFD voice app.
- Routing Calls Based on the Time of Day.
- Using the Authentication Component to Validate Customers.
- Using the Credit Card Component.
- Using the Loop component to navigate upwards
- Registering and making callbacks
- Using the survey component
- Using the CRM Lookup component
- See how to integrate your PBX with a CRM via the 3CX API.