menu
arrow_back

Exploring Google Ngrams with Amazon EMR

Exploring Google Ngrams with Amazon EMR

1시간 30분 크레딧 15개

SPL-31 Version 4.0.6

© 2019 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited.

Errors or corrections? Email us at aws-course-feedback@amazon.com.

Other questions? Contact us at https://aws.amazon.com/contact-us/aws-training/

Overview

In this lab, you will use Amazon EMR to analyze Ngrams from Google Books. An n-gram is a contiguous sequence of n items from a given sequence of text or speech. For example, consider this sentence:

Some 2-grams from this sentence are "the sun", "in the" and "sets in". A sample 3-gram is "sets in the" and a sample 4-gram is "rises in the east".

N-grams are used to predict the probability of certain words appearing in a sequence. This can be useful for providing typing suggestions on web pages and mobile phones.

The steps in this lab are very similar to the activities that a Data Scientist would perform when analyzing a new set of data. This includes loading the data, examining the data attributes and writing SQL to analyze the data. You will run SQL against publicly available Ngrams data stored in Amazon S3 to gain interesting insights.

Objectives

After completing this lab, you will be able to:

  • Create an Amazon EMR cluster running Hive and Ganglia
  • Use Hive statements to create tables from Google Ngram input data stored in Amazon S3
  • Run Hive queries to drill-down and analyze data
  • Use Ganglia to monitor an AMR cluster

Start Lab

Notice the lab properties below the lab title:

  • setup - The estimated time to set up the lab environment
  • access - The time the lab will run before automatically shutting down
  • completion - The estimated time the lab should take to complete
  1. At the top of your screen, launch your lab by clicking Start Lab

If you are prompted for a token, use the one distributed to you (or credits you have purchased).

A status bar shows the progress of the lab environment creation process. The AWS Management Console is accessible during lab resource creation, but your AWS resources may not be fully available until the process is complete.

  1. Open your lab by clicking Open Console

This will automatically log you into the AWS Management Console.

Please do not change the Region unless instructed.

Common login errors

Error : Federated login credentials

If you see this message:

  • Close the browser tab to return to your initial lab window
  • Wait a few seconds
  • Click Open Console again

You should now be able to access the AWS Management Console.

Error: You must first log out

If you see the message, You must first log out before logging into a different AWS account:

  • Click click here
  • Close your browser tab to return to your initial Qwiklabs window
  • Click Open Console again

이 실습의 나머지 부분과 기타 사항에 대해 알아보려면 Qwiklabs에 가입하세요.

  • Amazon Web Services 콘솔에 대한 임시 액세스 권한을 얻습니다.
  • 초급부터 고급 수준까지 200여 개의 실습이 준비되어 있습니다.
  • 자신의 학습 속도에 맞춰 학습할 수 있도록 적은 분량으로 나누어져 있습니다.
이 실습을 시작하려면 가입하세요