Analyzing Big Data with SQL

  • 4.9
Approx. 18 hours to complete

Course Summary

This course teaches SQL and how to use it for Big Data Analysis. It covers various topics like Cloudera, Hadoop, Hive, and Impala.

Key Learning Points

  • Learn SQL for Big Data Analysis
  • Understand Cloudera, Hadoop, Hive, and Impala
  • Gain hands-on experience with real-world datasets

Related Topics for further study


Learning Outcomes

  • Use SQL for Big Data Analysis
  • Understand Cloudera, Hadoop, Hive, and Impala
  • Gain hands-on experience with real-world datasets

Prerequisites or good to have knowledge before taking this course

  • Basic knowledge of SQL
  • Familiarity with Big Data concepts

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Big Data Essentials: HDFS, MapReduce and Spark RDD
  • Data Science Essentials

Related Education Paths


Related Books

Description

In this course, you'll get an in-depth look at the SQL SELECT statement and its main clauses. The course focuses on big data SQL engines Apache Hive and Apache Impala, but most of the information is applicable to SQL with traditional RDBMs as well; the instructor explicitly addresses differences for MySQL and PostgreSQL.

Knowledge

  • Understand the basics of SELECT statements
  • Understand how and why to filter results
  • Explore grouping and aggregation to answer analytic questions
  • Work with sorting and limiting results

Outline

  • Orientation to SQL on Big Data
  • Welcome to the Course
  • Review and Preparation
  • Using the Hue Query Editors
  • Running SQL Utility Statements
  • Running SQL SELECT Statements
  • Understanding Different SQL Interfaces
  • Overview of Beeline and Impala Shell
  • Using Beeline
  • Using Impala Shell
  • Instructions for Downloading and Installing the Exercise Environment
  • Troubleshooting the VM
  • (Optional) What about Spark SQL?
  • Expectations for Learners
  • (Optional) Using Other SQL Engines
  • Week 1 Core Quiz
  • Week 1 Honors Quiz
  • SQL SELECT Essentials
  • Introduction
  • SQL SELECT Building Blocks
  • Introduction to the SELECT List
  • Expressions and Operators
  • Data Types
  • Column Aliases
  • Built-In Functions
  • Data Type Conversion
  • The DISTINCT Keyword
  • Introduction to the FROM Clause
  • Identifiers
  • Formatting SELECT Statements
  • Using Beeline in Non-Interactive Mode
  • Using Impala Shell in Non-Interactive Mode
  • Formatting the Output of Beeline and Impala Shell
  • Saving Hive and Impala Query Results to a File
  • Order of Operations
  • Division and Modulo Operators
  • Common String Functions
  • Case (In)Sensitivity in SQL
  • Week 2 Core Quiz
  • Week 2 Honors Quiz
  • Filtering Data
  • Introduction
  • About the Datasets
  • Introduction to the WHERE Clause
  • Using Expressions in the WHERE Clause
  • Comparison Operators
  • Data Types and Precision
  • Logical Operators
  • Other Relational Operators
  • Understanding Missing Values
  • Handling Missing Values
  • Conditional Functions
  • Using Variables with Beeline and Impala Shell
  • Calling Beeline and Impala Shell from Scripts
  • Querying Hive and Impala in Scripts and Applications
  • Data Reference
  • (Optional) Unicode Characters
  • Working with Literal Strings
  • Missing Values with Logical Operators
  • Missing Values in String Columns
  • (Optional Exercise) Change VM Desktop Color
  • Week 3 Core Quiz
  • Week 3 Honors Quiz
  • Grouping and Aggregating Data
  • Introduction
  • Introduction to Aggregation
  • Common Aggregate Functions
  • Using Aggregate Functions in the SELECT Statement
  • Introduction to the GROUP BY Clause
  • Choosing an Aggregate Function and Grouping Column
  • Grouping Expressions
  • Grouping and Aggregation, Together and Separately
  • NULL Values in Grouping and Aggregation
  • The COUNT Function
  • Tips for Applying Grouping and Aggregation
  • Filtering on Aggregates
  • The HAVING Clause
  • Understanding Hive and Impala Version Differences
  • Understanding Hue Version Differences
  • COUNT(*) and SUM(1)
  • Interpreting Aggregates: Populations and Samples
  • The least and greatest Functions
  • Why Aggregate Expressions Ignore NULL Values
  • (Optional) Shortcuts for Grouping
  • How Grouping and Aggregation Can Mislead
  • Week 4 Core Quiz
  • Week 4 Honors Quiz
  • Sorting and Limiting Data
  • Introduction
  • Introduction to the ORDER BY Clause
  • Controlling Sort Order
  • Ordering Expressions
  • Missing Values in Ordered Results
  • Using ORDER BY with Hive and Impala
  • Introduction to the LIMIT Clause
  • When to Use the LIMIT Clause
  • Using LIMIT with ORDER BY
  • Using LIMIT for Pagination
  • Review
  • How to Effectively Use the Hive and Impala Documentation
  • Tips for Using the Hive Documentation
  • Tips for Using the Impala Documentation
  • Ordering by String Columns
  • Week 5 Core Quiz
  • Week 5 Honors Quiz
  • Combining Data
  • Introduction
  • Combining Query Results with the UNION Operator
  • Using ORDER BY and LIMIT with UNION
  • Introduction to Joins
  • Join Syntax
  • Inner Joins
  • Outer Joins
  • Conclusion
  • Handling NULL Values in Join Key Columns
  • Non-Equijoins
  • Cross Joins
  • Left Semi-Joins
  • Missing or Truncated Values from Type Conversion
  • Using UNION to Combine Three or More Results
  • Alternative Join Syntax
  • Joining Three or More Tables
  • Specifying Two or More Join Conditions
  • Week 6 Core Quiz
  • Week 6 Honors Quiz

Summary of User Reviews

Learn big data analysis and SQL queries with Coursera's Cloudera course. Students have rated this course highly for its comprehensive content and practical applications. Many users found the instructor to be engaging and knowledgeable.

Key Aspect Users Liked About This Course

Instructor engagement and knowledge

Pros from User Reviews

  • Comprehensive content
  • Practical applications
  • Interactive exercises
  • Clear and concise explanations
  • Real-world examples

Cons from User Reviews

  • Some technical difficulties with the platform
  • Difficulty accessing course materials
  • Limited interaction with other students
  • Lack of personalized feedback
  • Not suitable for beginners
English
Available now
Approx. 18 hours to complete
Ian Cook
Cloudera
Coursera

Instructor

Ian Cook

  • 4.9 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses