Build a Serverless Text-to-Speech Application with Amazon Polly
SPL-201 - Version 1.0
© 2018 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited.
Errors or corrections? Email us at email@example.com.
Other questions? Contact us at https://aws.amazon.com/contact-us/aws-training/
In general, speech synthesis is not easy. You cannot assume that when an application reads each letter of a sentence, the output will make sense. A few common challenges for text-to-speech applications include:
- Words that are written the same way, but that are pronounced differently: I live in Las Vegas compared to This presentation broadcasts live from Las Vegas.
- Text normalization: Disambiguating abbreviations, acronyms, and units: St., which can be expanded as Street or Saint.
- Converting text to phonemes in languages with complex mapping, such as, in English, tough, through, and though. In this example, similar parts of different words can be pronounced differently depending on the word and context.
- Foreign words (déjà vu), proper names (François Hollande) and slang (ASAP, LOL).
Amazon Polly provides speech synthesis functionality that overcomes these challenges, allowing you to focus on building applications that use text-to-speech instead of addressing interpretation challenges.
Amazon Polly turns text into life-like speech. It lets you create applications that talk naturally, enabling you to build entirely new categories of speech-enabled products. Amazon Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. It currently includes dozens of lifelike voices in over 20 languages, so you can select the ideal voice and build speech-enabled applications that work in many different countries.
In addition, Amazon Polly delivers the consistently fast response times required to support real-time, interactive dialog. You can cache and save Polly's audio files for offline replay or redistribution. (In other words, what you convert and save is yours. There are no additional text-to-speech charges for using the speech.) Polly is also easy to use. You simply send the text you want to convert into speech to the Amazon Polly API. Amazon Polly immediately returns the audio stream to your application so that your application can play it directly or store it in a standard audio file format such as an MP3.
In this lab you will create a basic, serverless application that uses Amazon Polly to convert text to speech. The application has a simple user interface that accepts text in many different languages and then converts it into audio files that you can play from a web browser. This lab will use blog posts, but you can use any type of text. For example, you can use the application to read recipes while you are preparing a meal, or news articles or books while you are driving or riding a bike.
You will build a serverless application, which means that you will not need to work with servers — no provisioning, no patching, no scaling. The AWS Cloud automatically takes care of this, allowing you to focus on your application.
The application provides two methods – one for sending information about a new post, which should be converted into an MP3 file, and one for retrieving information about the post (including a link to the MP3 file stored in an Amazon S3 bucket). Both methods are exposed as RESTful web services through Amazon API Gateway.
When the application sends information about new posts:
The information is received by the RESTful web service exposed by Amazon API Gateway. This web service is invoked by a static webpage hosted on Amazon Simple Storage Service (Amazon S3).
Amazon API Gateway triggers an AWS Lambda function, New Post, which is responsible for initializing the process of generating MP3 files.
The Lambda function inserts information about the post into an Amazon DynamoDB table, where information about all posts is stored.
To run the whole process asynchronously, you will use Amazon Simple Notification Service (Amazon SNS) to decouple the process of receiving information about new posts and starting their audio conversion.
Another Lambda function, Convert to Audio, is subscribed to your SNS topic and is triggered whenever a new message appears (which means that a new post should be converted into an audio file).
The Convert to Audio Lambda function uses Amazon Polly to convert the text into an audio file in the specified language (the same as the language of the text).
The new MP3 file is saved in a dedicated S3 bucket.
Information about the post is updated in the DynamoDB table. The URL to the audio file stored in the S3 bucket is saved with the previously stored data.
When the application retrieves information about posts:
The RESTful web service is deployed using Amazon API Gateway. Amazon API Gateway exposes the method for retrieving information about posts. These methods contain the text of the post and the link to the S3 bucket where the MP3 file is stored. The web service is invoked by a static webpage hosted on Amazon S3.
Amazon API Gateway invokes the Get Post Lambda function, which deploys the logic for retrieving the post data.
The Get Post Lambda function retrieves information about the post (including the reference to Amazon S3) from the DynamoDB table and returns the information.
By the end of this lab, you will be able to:
- Create an Amazon DynamoDB to store data
- Create an Amazon API Gateway RESTful API
- Create AWS Lambda functions triggered by API Gateway
- Connect AWS Lambda functions with Amazon Simple Notification Service (SNS)
- Use Amazon Polly to synthesize speech in a variety of languages and voices
Notice the lab properties below the lab title:
- setup - The estimated time to set up the lab environment.
- access - The time the lab will run before automatically shutting down.
- completion - The estimated time the lab should take to complete.
Click START LAB to launch your lab. If you are prompted for a token, use the one distributed to you (or credits you've purchased).
A status bar shows the progress of the lab environment creation process (the AWS Management Console is accessible during lab resource creation, but your AWS resources may not be fully available until the process is complete).
Click OPEN CONSOLE, which will automatically log you in to the AWS Console.
Please do not change the Region unless instructed.
Common login errors
Error : Federated login credentials
If you see this message:
- Close the browser tab to return to your initial lab window
- Wait a few seconds
- Click Open Console again
You should now be able to access the AWS Management Console.
Error: You must first log out
If you see this message:
- Click To logout, click here
- Close the browser tab to return to your initial Qwiklabs window
- Click Open Console again
Join Qwiklabs to read the rest of this lab...and more!
- Get temporary access to the Amazon Web Services Console.
- Over 200 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.