Distributed Computation of NDVI from Landsat Images Using Cloud Dataflow
In this lab you process Landsat data in a distributed manner using Apache Beam and Cloud Dataflow. This lab is part of a series of labs on processing scientific data.
Before starting this lab, we highly recommend you read this blog post on what this pipeline does, what methods it uses, and what the results look like.
What you learn
In this lab, you:
- Examine Apache Beam code to carry out Landsat processing
- Submit Beam pipeline to Dataflow runner
- View job details
Consider using Apache Beam on Cloud Dataflow to scale out compute-intensive jobs that meet these conditions:
Your data is not tabular and you can not use SQL to do the analysis. (If it is tabular, use BigQuery).
Large portions of the job are embarrassingly parallel -- in other words, you can process different subsets of the data on different machines.
Your logic involves custom functions, iterations, etc...
The distribution of the work varies across your data subsets.
Join Qwiklabs to read the rest of this lab...and more!
- Get temporary access to the Google Cloud Console.
- Over 200 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.
Create a Cloud Storage bucket
Launch the Dataflow Job