ETL Processing on GCP Using Dataflow and BigQuery
In this lab you will build several Data Pipelines that will ingest data from a publicly available dataset into BigQuery, using these GCP services:
- GCS - Google Cloud Storage
- Dataflow - Google Dataflow
- BigQuery - BigQuery tables
You will create your own Data Pipeline, including the design considerations, as well as implementation details, to ensure that your prototype meets the requirements. Be sure to open the python files and read the comments when instructed to.
Join Qwiklabs to read the rest of this lab...and more!
- Get temporary access to the Google Cloud Console.
- Over 200 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.
Create a Cloud Storage Bucket
Copy Files to Your Bucket
Create the BigQuery Dataset (name: lake)
Build a Data Ingestion Dataflow Pipeline
Build a Data Transformation Dataflow Pipeline
Build a Data Enrichment Dataflow Pipeline
Build a Data lake to Mart Dataflow Pipeline
Build a Data lake to Mart CoGroupByKey Dataflow Pipeline