Apache Spark (TM) SQL for Data Analysts

  • 4.6
Approx. 14 hours to complete

Course Summary

In this course, you will learn how to use Apache Spark SQL to analyze data and gain insights from it. You will also learn how to use Spark SQL to manipulate data and prepare it for analysis.

Key Learning Points

  • Learn how to use Spark SQL to analyze data
  • Gain insights from data using Spark SQL
  • Manipulate data and prepare it for analysis

Job Positions & Salaries of people who have taken this course might have

    • USA: $75,000 - $130,000
    • India: ₹4,00,000 - ₹10,00,000
    • Spain: €25,000 - €50,000
    • USA: $75,000 - $130,000
    • India: ₹4,00,000 - ₹10,00,000
    • Spain: €25,000 - €50,000

    • USA: $100,000 - $150,000
    • India: ₹7,00,000 - ₹20,00,000
    • Spain: €30,000 - €60,000
    • USA: $75,000 - $130,000
    • India: ₹4,00,000 - ₹10,00,000
    • Spain: €25,000 - €50,000

    • USA: $100,000 - $150,000
    • India: ₹7,00,000 - ₹20,00,000
    • Spain: €30,000 - €60,000

    • USA: $90,000 - $140,000
    • India: ₹6,00,000 - ₹15,00,000
    • Spain: €25,000 - €55,000

Related Topics for further study


Learning Outcomes

  • Understand the basics of Apache Spark SQL
  • Learn how to manipulate and prepare data using Spark SQL
  • Gain insights from data analysis using Spark SQL

Prerequisites or good to have knowledge before taking this course

  • Basic knowledge of SQL
  • Familiarity with programming concepts

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Big Data Essentials: HDFS, MapReduce and Spark RDD
  • Data Analysis with Python

Related Education Paths


Notable People in This Field

  • Creator of Apache Spark
  • Creator of Alluxio

Related Books

Description

Apache Spark is one of the most widely used technologies in big data analytics. In this course, you will learn how to leverage your existing SQL skills to start working with Spark immediately. You will also learn how to work with Delta Lake, a highly performant, open-source storage layer that brings reliability to data lakes. By the end of this course, you will be able to use Spark SQL and Delta Lake to ingest, transform, and query data to extract valuable insights that can be shared with your team.

Knowledge

  • Ingest, transform, and query data to extract valuable insights.
  • Leverage existing SQL skills to start working with Apache Spark.

Outline

  • Welcome to Apache Spark SQL for Data Analysts
  • Course goals
  • Before you begin
  • End of module knowledge check
  • Spark makes big data easy
  • Introduction to module 2
  • What is big data?
  • Common struggles with big data
  • Big Data Needs
  • Apache Spark Intro
  • Spark SQL
  • Module 2 Concept Review
  • Using Spark SQL on Databricks
  • Introduction to Module 3
  • Signing up for Databricks Community Edition
  • Preparing your workspace
  • Working with notebooks
  • Using course materials
  • Basic queries with Spark SQL reading introduction
  • Data Visualization on Databricks reading introduction
  • Data visualization tools
  • Exploratory Data Analysis lab introduction
  • Course Materials
  • Basic Queries reading activity
  • Data Visualization reading activity
  • Your turn! Exploratory Data Analysis lab
  • Module 3 Concept Review
  • 3.3 Exploratory Data Analysis Quiz
  • Spark Under the Hood
  • Introduction to module 4
  • Understanding optimizations
  • The physical cluster
  • The SparkUI and SQL tab
  • Optimizing query logic
  • Impact of Caching
  • Optimizing with selective data loading
  • Module 4 Concept Review
  • Complex Queries
  • Introduction to module 5
  • What is nested data?
  • Introduction to managing nested data
  • Introduction to Manipulating Data
  • Introduction to Data Munging
  • Managing Nested Data reading activity
  • Manipulating Data reading activity
  • 5.3 Data Munging Lab
  • Module 5 Concept Review
  • Lab 5.3 Quiz
  • Applied Spark SQL
  • Introduction to module 6
  • Complex data - common strategies
  • About higher-order functions
  • Higher-order functions introduction
  • Introducing Aggregating and Summarizing Data
  • Partitioning Tables Introduction
  • Sharing Insights Lab Introduction
  • Higher Order Functions reading activity
  • Aggregating and Summarizing Data reading activity
  • Partitioning Tables
  • Sharing Insights
  • Module 6 concept review
  • Lab 6.4 Quiz
  • Data Storage and Optimization
  • Introduction to module 7
  • A quick refresher
  • Introducing a new data management paradigm
  • Introduction to the lesson
  • What is Delta Lake
  • Data Warehouses
  • Data Lakes
  • Data Lakes vs Data Warehouses
  • The Lakehouse
  • Delta Lake with Spark SQL
  • Introduction to the module
  • Intro to Using Delta reading
  • Managing Records in a Delta table
  • Delta Engine Optimization Introduction
  • Delta Lake Lab Introduction
  • 8.1 Using Delta
  • 8.2 Managing records
  • 8.3 Optimizing Delta
  • Delta Lab
  • 8.4 Delta Lab
  • SQL Coding Challenges
  • SQL coding challenges
  • Final Exam

Summary of User Reviews

Discover Apache Spark SQL for Data Analysts with Coursera. This course has received high praise from its users for its comprehensive and practical approach to teaching data analysis. Many users appreciate the well-structured course material that is easy to follow and understand.

Key Aspect Users Liked About This Course

The course material is well-structured and easy to follow.

Pros from User Reviews

  • The course content is comprehensive and practical.
  • The instructors are knowledgeable and engaging.
  • The course provides hands-on experience with real-world datasets.
  • The course is well-structured and easy to follow.
  • The course provides a solid foundation for further study in data analysis.

Cons from User Reviews

  • Some users felt that the course moved too quickly through certain topics.
  • Some users found the course material to be too basic.
  • Some users experienced technical difficulties with the course platform.
  • Some users felt that the assessments were too difficult or not relevant to the course material.
  • Some users would have liked more opportunities for interaction with instructors and other students.
English
Available now
Approx. 14 hours to complete
Kate Sullivan
Databricks
Coursera

Instructor

Kate Sullivan

  • 4.6 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses