Managing Big Data in Clusters and Cloud Storage

  • 4.7
Approx. 21 hours to complete

Course Summary

Learn how to store and analyze big data with cloud storage and SQL in this course. Gain valuable skills to help you tackle big data analysis in various industries.

Key Learning Points

  • Understand the basics of cloud storage and big data analysis
  • Learn how to use SQL to manipulate and analyze data
  • Gain practical skills through hands-on projects

Related Topics for further study


Learning Outcomes

  • Ability to store and analyze big data with cloud storage and SQL
  • Practical skills through hands-on projects
  • Understanding of data manipulation techniques

Prerequisites or good to have knowledge before taking this course

  • Basic understanding of programming concepts
  • Familiarity with SQL

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Data Warehousing for Business Intelligence
  • Big Data Essentials: HDFS, MapReduce, and Spark RDD
  • Introduction to Data Analysis Using Excel

Related Education Paths


Notable People in This Field

  • Data Scientist

Related Books

Description

In this course, you'll learn how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Apache Hive and Apache Impala. You’ll learn how to choose the right data types, storage systems, and file formats based on which tools you’ll use and what performance you need.

Knowledge

  • Use different tools to browse existing databases and tables in big data systems
  • Use different tools to explore files in distributed big data filesystems and cloud storage
  • Create and manage big data databases and tables using Apache Hive and Apache Impala
  • Describe and choose among different data types and file formats for big data systems

Outline

  • Orientation to Data in Clusters and Cloud Storage
  • Welcome to the Course
  • Browsing Tables with Hue
  • Browsing Tables with SQL Utility Statements
  • Browsing HDFS with the Hue File Browser
  • Browsing HDFS from the Command Line
  • Understanding S3 and Other Cloud Storage Platforms
  • Browsing S3 Buckets from the Command Line
  • Review and Preparation
  • Instructions for Downloading and Installing the Exercise Environment
  • Troubleshooting the VM
  • Week 1 Graded Quiz
  • Defining Databases, Tables, and Columns
  • Week 2 Introduction
  • Introduction to the CREATE TABLE Statement
  • Using Different Schemas on the Same Data
  • Specifying TBLPROPERTIES
  • Examining, Modifying, and Removing Tables
  • Hive and Impala Interoperability
  • Impala Metadata Refresh
  • Creating Databases and Tables with Hue
  • Creating Databases and Tables with SQL
  • Permissions to Create Databases and Tables
  • The ROW FORMAT Clause
  • The STORED AS Clause
  • The LOCATION Clause
  • CREATE TABLE Shortcuts
  • Using Hive SerDes
  • Working with Unstructured and Semi-Structured Data
  • Examining Table Structure
  • Dropping Databases and Tables
  • Modifying Existing Tables
  • Week 2 Practice Quiz
  • Week 2 Graded Quiz
  • Data Types and File Types
  • Week 3 Introduction
  • Overview of Data Types
  • Choosing the Right Data Types
  • Overview of File Types
  • Choosing the Right File Types
  • Integer Data Types
  • Decimal Data Types
  • Character String Data Types
  • Other Data Types
  • Examining Data Types
  • Out-of-Range Values
  • Text Files
  • Avro Files
  • Parquet Files
  • ORC Files
  • Other File Types
  • Creating Tables with Avro and Parquet Files
  • Week 3 Practice Quiz
  • Week 3 Graded Quiz
  • Managing Datasets in Clusters and Cloud Storage
  • Week 4 Introduction
  • Refresh Impala's Metadata Cache after Loading Data
  • Loading Files into HDFS with Hue's Table Browser
  • Loading Files into HDFS with Hue's File Browser
  • Loading Files into HDFS from the Command Line
  • Loading Files into S3 from the Command Line
  • Using Hive and Impala to Load Data into Tables
  • Conclusion
  • More about HDFS Shell Commands
  • Chaining and Scripting with HDFS Commands
  • HDFS Permissions
  • Other Ways to Load Files into S3
  • S3 Permissions
  • Missing Values
  • Character Sets
  • Using Sqoop to Import Data
  • More Sqoop Import Options
  • Using Sqoop to Export Data
  • SQL LOAD DATA Statements
  • SQL INSERT Statements
  • SQL INSERT ... SELECT and CTAS Statements
  • Week 4 Practice Quiz
  • Week 4 Graded Quiz
  • Optimizing Hive and Impala (Honors)
  • Week 5 Introduction
  • What to Do When Queries Are Too Complex
  • What to Do When Queries Take Too Long
  • When to Use Table Partitioning
  • When to Use Complex Columns
  • File Systems versus Storage Engines
  • Creating and Querying Views
  • Modifying and Removing Views
  • Materialized and Non-Materialized Views
  • The ORDER BY Clause in Views
  • Choosing Which Query Engine to Use
  • Understanding Map Tasks and Reduce Tasks
  • Hive Query Performance Patterns
  • Understanding Execution Plans
  • Table and Column Statistics
  • Other Strategies for Query Optimization
  • Creating Partitioned Tables
  • Loading Data with Dynamic Partition
  • Loading Data with Static Partitioning
  • Risks of Using Partitioning
  • Complex Data Types
  • Creating Tables with Complex Data
  • Querying Complex Data with Hive
  • Querying Complex Data with Impala
  • Complex Data in Practice
  • Overview of Apache Kudu
  • Week 5 Practice Quiz
  • Week 5 Graded Quiz

Summary of User Reviews

Read reviews about the Cloud Storage & Big Data Analysis with SQL course on Coursera. It has received positive reviews overall. Many users appreciated the practical examples provided throughout the course.

Key Aspect Users Liked About This Course

Practical examples provided throughout the course.

Pros from User Reviews

  • Well-structured course content
  • Easy to follow explanations
  • Great introduction to cloud storage and big data analysis
  • Good for beginners

Cons from User Reviews

  • Lack of in-depth analysis
  • Not suitable for advanced learners
  • Some technical issues reported
  • Course materials not always up-to-date
English
Available now
Approx. 21 hours to complete
Ian Cook, Glynn Durham
Cloudera
Coursera

Instructor

Ian Cook

  • 4.7 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses