Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames

  • 4
Approx. 37 hours to complete

Course Summary

Learn how to analyze big data and make data-driven decisions with this course. Gain hands-on experience with Hadoop, Spark, and other big data tools.

Key Learning Points

  • Learn to work with Hadoop and Spark
  • Gain hands-on experience with real-world big data projects
  • Learn to make data-driven decisions

Related Topics for further study


Learning Outcomes

  • Learn to work with Hadoop and Spark to analyze big data
  • Gain hands-on experience with real-world big data projects
  • Develop skills to make data-driven decisions

Prerequisites or good to have knowledge before taking this course

  • Basic knowledge of programming concepts
  • Familiarity with SQL

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced
  • Video lectures
  • Hands-on projects

Similar Courses

  • Big Data Essentials: HDFS, MapReduce and Spark RDD
  • Big Data and Hadoop Essentials

Related Education Paths


Notable People in This Field

  • Creator of Hadoop
  • Creator of Apache Spark

Related Books

Description

No doubt working with huge data volumes is hard, but to move a mountain, you have to deal with a lot of small stones. But why strain yourself? Using Mapreduce and Spark you tackle the issue partially, thus leaving some space for high-level tools. Stop struggling to make your big data workflow productive and efficient, make use of the tools we are offering you.

Outline

  • Welcome to the Second Course: Big Data Analysis
  • Computations Optimization
  • What is BigData Analysis?
  • Tools For BigData Analysis
  • Graph Data Analysis
  • Meet Alexey Dral
  • Meet Pavel Mezentsev
  • Meet Natalia Pritykovskaya
  • Meet Pavel Klemenkov
  • Slack Channel is the quickest way to get answers to your questions
  • Big Data SQL: Hive
  • Analytics: Business Use Cases
  • HTTP Web Service: Access Log Format
  • Business Use Cases: Solution with Hive
  • (optional) SQL: likbez
  • Hive Data Definition Language (DDL)
  • Hive Data Manipulation Language (DML)
  • Hive Analytics: RegexSerDe, Views
  • (optional) Regular Expressions, Likbez
  • Hive Analytics: UDF, UDAF, UDTF
  • Hive Streaming
  • Hive PTF (Window Functions)
  • Hive Optimization: Partitioning, Bucketing and Sampling
  • Hive Map-Side Joins: Plain, Bucket, Sort-Merge
  • Hive Optimization: Data Skew
  • Hive Optimization: Row-Columnar File Formats, Compression
  • Hive: SQL over Hadoop MapReduce
  • Hive Analytics with UDF and Streaming
  • Hive final
  • Big Data SQL: Hive (practice week)
  • How to submit your first assignment
  • How to Install Docker on Windows 7, 8, 10
  • How to submit your first Hive assignment
  • Grading System: Instructions and Common Problems
  • Docker Installation Guide
  • Assignments. General requirements
  • Hive assignment. Intro and instructions
  • Spark SQL and Spark Dataframe
  • Advantages of Spark SQL
  • What is Pandas DataFrame and how to create it
  • How to process a DataFrame as SQL
  • Working with Hive
  • Reading and Writing Files
  • RDD vs. DF vs. SQL
  • Projection and Filtering
  • Functions
  • Aggregates
  • Join
  • User Defined Functions
  • Time Processing
  • Window Functions
  • Two-Dimensional Distributions
  • Introducing DataFrame and SQL
  • Spark SQL and Spark Dataframe
  • Graph Analysis from Big Data Perspective
  • Graph examples
  • Graph representation
  • Counting common friends. Part I
  • Counting common friends. Part II
  • Counting common friends. Part III
  • GraphFrames: Introduction
  • Motif Finding: DSL
  • Motif Finding: Counting Mutual Friends
  • Motif Finding: Under The Hood. Part 1
  • Motif Finding: Under The Hood. Part 2
  • Triangles Count: Introduction
  • Triangles Count: Edge Lists
  • Triangles Count: GraphFrame
  • Graph Representations
  • Motif Finding
  • Triangles Count
  • Graph Analysis from Big Data Perspective
  • PageRank and Recent Advances
  • Introduction
  • Algorithm
  • GraphFrames
  • Random Walk
  • Page Rank Algorithm
  • RDD Implementation
  • GraphFrames API
  • Taste Graph. Part I
  • Taste Graph. Part II
  • Taste Graph. Part III
  • Graph based Music Recommender
  • Connected Components
  • PageRank
  • Label Propagation Algorithm (LPA)
  • PageRank and Recent Advances
  • Spark Internals and Optimization
  • Welcome
  • Spark Execution Model
  • Shuffle. Where to send data?
  • Shuffle. How to send data?
  • Optimizing Functions
  • PageRank Optimization
  • Spark SQL. Motivation
  • Catalyst
  • Catalyst Optimization Example
  • Joins
  • Optimizing Joins
  • UDF Optimization
  • Persistance and Checkpointing
  • Memory Management
  • Resource Allocation
  • Dynamic Allocation
  • Speculative Execution
  • Deployment of the environment
  • Spark Execution Model & RDD Internals
  • Spark SQL and Catalyst
  • Memory management and resource allocation
  • Final Quiz

Summary of User Reviews

Learn Big Data Analysis online with Coursera. This course has received positive reviews from users who found it to be very informative and helpful. Many users appreciated the practical examples and real-world applications of the course material.

Key Aspect Users Liked About This Course

The practical examples and real-world applications of the course material.

Pros from User Reviews

  • Informative and helpful
  • Practical examples and real-world applications
  • Great for beginners
  • Clear and concise instruction
  • Engaging and interactive coursework

Cons from User Reviews

  • Some concepts may be too basic for advanced learners
  • Limited hands-on experience
  • Lack of depth in certain topics
  • Course material can be dry at times
  • No opportunity for one-on-one interaction with instructors
English
Available now
Approx. 37 hours to complete
Alexey A. Dral, Pavel Klemenkov, Natalia Pritykovskaya, Pavel Mezentsev
Yandex
Coursera

Instructor

Alexey A. Dral

  • 4 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses