Building Batch Data Pipelines on GCP

  • 4.5
Approx. 13 hours to complete

Course Summary

Learn how to build scalable and efficient batch data pipelines using Google Cloud Platform with this comprehensive course. With hands-on labs and real-world examples, you'll gain the skills needed to design and deploy data processing systems on GCP.

Key Learning Points

  • Understand the basics of batch data processing and how to use GCP tools to build pipelines.
  • Learn how to design and implement data processing systems for different use cases.
  • Explore advanced concepts such as fault tolerance, scalability, and monitoring.

Related Topics for further study


Learning Outcomes

  • Design and deploy batch data processing systems on GCP.
  • Implement fault tolerant and scalable data pipelines.
  • Monitor and troubleshoot batch data processing systems.

Prerequisites or good to have knowledge before taking this course

  • Familiarity with programming concepts and SQL.
  • Basic knowledge of cloud computing.

Course Difficulty Level

Intermediate

Course Format

  • Online self-paced course
  • Hands-on labs and real-world examples

Similar Courses

  • Data Engineering on Google Cloud Platform
  • Building Batch Data Pipelines on AWS
  • Apache Beam on Google Cloud Dataflow

Related Education Paths


Related Books

Description

Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud Platform using Qwiklabs.

Outline

  • Introduction
  • Course Introduction
  • Getting Started with Google Cloud and Qwiklabs
  • Introduction to Batch Data Pipelines
  • EL, ELT, ETL
  • Quality considerations
  • How to carry out operations in BigQuery
  • Shortcomings
  • ETL to solve data quality issues
  • EL, ELT, ETL
  • Executing Spark on Cloud Dataproc
  • The Hadoop ecosystem
  • Running Hadoop on Cloud Dataproc
  • GCS instead of HDFS
  • Optimizing Dataproc
  • Optimizing Dataproc Storage
  • Optimizing Dataproc Templates and Autoscaling
  • Optimizing Dataproc Monitoring
  • Lab Intro: Running Apache Spark jobs on Cloud Dataproc
  • Summary
  • Executing Spark on Cloud Dataproc
  • Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
  • Introduction
  • Components of Data Fusion
  • Building a Pipeline
  • Exploring Data using Wrangler
  • Lab: Building and executing a pipeline graph in Cloud Data Fusion
  • Orchestrating work between GCP services with Cloud Composer
  • Apache Airflow Environment
  • DAGs and Operators
  • Workflow scheduling
  • Monitoring and Logging
  • Lab: An Introduction to Cloud Composer
  • Cloud Data Fusion and Cloud Composer
  • Serverless Data Processing with Cloud Dataflow
  • Cloud Dataflow
  • Why customers value Dataflow
  • Building Cloud Dataflow Pipelines in code
  • Key considerations with designing pipelines
  • Transforming data with PTransforms
  • Lab: Building a Simple Dataflow Pipeline
  • Aggregating with GroupByKey and Combine
  • Lab: MapReduce in Cloud Dataflow
  • Side Inputs and Windows of data
  • Lab: Practicing Pipeline Side Inputs
  • Creating and re-using Pipeline Templates
  • Cloud Dataflow SQL pipelines
  • Data Processing with Cloud Dataflow
  • Summary
  • Course Summary

Summary of User Reviews

Discover how to build and operate effective batch data pipelines on Google Cloud Platform with the Batch Data Pipelines GCP course on Coursera. Users found this course to be comprehensive and well-structured, with clear explanations and practical examples.

Key Aspect Users Liked About This Course

comprehensive and well-structured course with clear explanations and practical examples

Pros from User Reviews

  • Hands-on practice with GCP tools
  • Great instructor with a deep understanding of the topic
  • Easy to follow and understand
  • Real-world examples provided
  • Exercises and quizzes reinforce understanding

Cons from User Reviews

  • Some lectures are too basic
  • Not enough emphasis on best practices and optimization
  • Course may not be suitable for advanced users
  • Lack of depth on some topics
  • Could benefit from more hands-on projects
English
Available now
Approx. 13 hours to complete
Google Cloud Training
Google Cloud
Coursera

Instructor

Google Cloud Training

  • 4.5 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses