Optimize slow databases with Amazon Aurora

Optimize slow databases with Amazon Aurora

2 hours 5 Credits

SPL-252 - Version 1.0.0

The lab dataset is being provided to you by permission of IMDb (Internet Movie Database) and is subject to the terms of the AWS Digital Training Agreement (available at You are expressly prohibited from copying, modifying, selling, exporting, or using this dataset in any way other than for the purpose of completing this lab.

© 2019 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited.

Errors or corrections? Email us at

Other questions? Contact us at

Lab overview

You started a small crowdsourcing platform and built a portfolio of mobile and web applications that enables consumers to produce content based on current events. Initially, you used a MySQL database running on an Amazon EC2 instance to underpin your applications. As business grew, you started looking for a more scalable solution to manage your database requirements and handle a few challenges you experienced while hosting the database. After analyzing the market offerings, you decided to use Amazon Aurora for your growing workloads.

During the development of the new application, you noticed poor performance when retrieving data from the Aurora database. You suspect the issue may be related to poor query design. You will use different tools in Amazon Aurora to find the problematic queries and optimize the queries.

Amazon Aurora is a MySQL- and PostgreSQL-compatible relational database engine built for the cloud. Aurora is fully managed by Amazon Relational Database Service (Amazon RDS), which automates time-consuming administration tasks like hardware provisioning, database setup, patching, and backups. Aurora is built on a modern, purpose-built distributed storage system. All data is distributed in three different AWS Availability Zones, across hundreds of storage nodes, with two copies per zone. The Aurora MySQL- and PostgreSQL-compatible database engines are customized to take advantage of the fast distributed storage.

Aurora data model

In this lab, you will use the IMDb dataset to test the Amazon RDS Aurora cluster. You will use slow query logs and Amazon RDS Performance Insights to evaluate your queries. Then you will fine-tune the query and evalute performance gains. You will also learn to configure auto scaling for read replicas in the Amazon Aurora cluster to mitigate peak load performance impacts. During this activity, you will review different parameters that are available to optimize the database performance for your workload.


After completing this lab, you will be able to:

  • Enable slow query logs in Amazon RDS to investigate underperforming MySQL queries
  • Investigate performance with Amazon RDS Performance Insights
  • Use best practice to optimize MySQL queries
  • Use Aurora read replica auto-scaling feature to address the sudden increase in load


This lab requires:

  • Access to a notebook computer with Wi-Fi and Microsoft Windows, macOS X, or Linux (Ubuntu, SuSE, or Red Hat)

    Note The lab environment is not accessible using an iPad or tablet device, but you can use these devices to access the student guide

  • For Microsoft Windows users: Administrator access to the computer

  • An internet browser such as Chrome, Firefox, or Internet Explorer 9 (previous versions of Internet Explorer are not supported)

  • An SSH client such as PuTTY

  • Familiarity with MySQL database syntax and operation


This lab will require 90 minutes to complete.

Start Lab

  1. At the top of your screen, launch your lab by clicking Start Lab

This will start the process of provisioning your lab resources. An estimated amount of time to provision your lab resources will be displayed. You must wait for your resources to be provisioned before continuing.

If you are prompted for a token, use the one distributed to you (or credits you have purchased).

  1. Open your lab by clicking Open Console

This will automatically log you into the AWS Management Console.

Please do not change the Region unless instructed.

Common login errors

Error : Federated login credentials

If you see this message:

  • Close the browser tab to return to your initial lab window
  • Wait a few seconds
  • Click Open Console again

You should now be able to access the AWS Management Console.

Error: You must first log out

If you see the message, You must first log out before logging into a different AWS account:

  • Click click here
  • Close your browser tab to return to your initial Qwiklabs window
  • Click Open Console again

AWS services not used in this lab

AWS services that are not used in this lab are disabled in the lab environment. In addition, the capabilities of the services used in this lab are limited to what the lab requires. Expect errors when accessing other services or performing actions beyond those provided in this lab guide.

Lab environment

The lab environment already includes necessary components, such as Amazon EC2 instances, AWS Identity and Access Management (IAM) roles, and an Amazon Aurora database cluster. The lab contains instructions to review the code, configure the necessary variables, and run the scripts, so that you can set up components when necessary.

The two database instances, reader and writer, are placed in different Availability Zones inside private subnets. Two Amazon EC2 instances are provisioned in a public subnet with access to the internet. One acts as CommandHost and other as a traffic generator. To access the database, you must first connect to the CommandHost, then connect to the database instances via port 3306. The environment also includes an Amazon CloudWatch dashboard with preconfigured widgets.

The following image is a network diagram of the environment:

Network diagram

IMDb Data Overview

The following image displays the schema for the IMDb data you will work with throughout this lab.

IMDb Schema

Table explanation:

  • title_display: Records data for each title, assigning each a titleId
  • name_facts: Records facts around a person. Assigns a unique nameId to each person
  • name_display: Records how a person’s information is displayed. Shares the nameId with name_facts
  • name_filmography: Relates a person (nameId) to a title (titleId)
  • title_genres: Relates a title (titleId) to genres
  • title_crew: Relates a title (titleId) to a person (nameId) and the job held on the crew
  • title_cast: Relates the title (titleId) to a person (nameId) and the casting they held

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Amazon Web Services Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab