Big Data Essentials: HDFS, MapReduce and Spark RDD

  • 4
Approx. 41 hours to complete

Course Summary

Learn the essentials of big data and its impact on businesses and society. Get hands-on experience with Hadoop, MapReduce, Spark, and NoSQL databases.

Key Learning Points

  • Understand the basics of big data and its impact on various industries
  • Learn popular big data tools such as Hadoop, MapReduce, Spark, and NoSQL databases
  • Apply big data tools to real-world scenarios and gain hands-on experience

Related Topics for further study


Learning Outcomes

  • Understand the basics of big data and its impact on various industries
  • Gain hands-on experience with popular big data tools
  • Apply big data tools to real-world scenarios and solve business problems

Prerequisites or good to have knowledge before taking this course

  • Basic programming knowledge
  • Familiarity with databases

Course Difficulty Level

Intermediate

Course Format

  • Self-paced
  • Online
  • Hands-on

Similar Courses

  • Big Data Modeling and Management Systems
  • Big Data Integration and Processing

Related Education Paths


Notable People in This Field

  • Doug Cutting
  • Michael Stonebraker

Related Books

Description

Have you ever heard about such technologies as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don’t miss this course either!

Outline

  • Welcome
  • Why BigData?
  • Issues BigData can solve
  • BigData Applications
  • What is BigData Essentials?
  • Course Structure
  • Meet Emeli
  • Meet Alexey
  • Meet Ivan
  • Slack Channel is the quickest way to get answers to your questions
  • What are BigData and distributed file systems (e.g. HDFS)?
  • File system exploration
  • File system managing
  • File content exploration 1
  • File content exploration 2
  • Processes
  • Scaling Distributed File System
  • Block and Replica States, Recovery Process 1
  • Block and Replica States, Recovery Process 2
  • HDFS Client
  • Web UI, REST API
  • Namenode Architecture
  • Introduction
  • Text formats
  • Binary formats 1
  • Binary formats 2
  • Compression
  • How to submit your first assignment
  • How to Install Docker on Windows 7, 8, 10
  • Basic Bash Commands
  • HDFS Lesson Introduction
  • Gentle Introduction into "curl"
  • File formats extra (optional)
  • Grading System: Instructions and Common Problems
  • Docker Installation Guide
  • HDFS CLI Playground
  • Programming Assignment: Instructions and Common Problems
  • FAQ How to show your code to teaching staff
  • Slack channel "Bigdata-coursera" - the quickest to solve technical problems.
  • Distributed File Systems
  • Big Data and Distributed File Systems
  • Solving Problems with MapReduce
  • Unreliable Components 1
  • Unreliable Components 2
  • MapReduce
  • Distributed Shell
  • Fault Tolerance
  • Fault Tolerance. Live Demo
  • Streaming
  • Streaming in Python
  • WordCount in Python
  • Distributed Cache
  • Environment, Counters
  • Testing
  • Combiner
  • Partitioner
  • Comparator
  • Speculative Execution / Backup Tasks
  • Compression
  • Hadoop Streaming Assignments: Intro and Code Samples
  • Hadoop MapReduce Intro
  • MapReduce Streaming
  • Hadoop Streaming Final
  • Solving Problems with MapReduce (practice week)
  • How to submit your first Hadoop assignment
  • Hadoop Streaming Assignments: Intro and Code Samples
  • Hints to Debug Hadoop Streaming Applications
  • Grading System and Grading System Sandbox User Guide
  • Programming Assignment: Instructions and Common Problems
  • Hadoop Streaming Assignments: Instructions
  • Hint to the "Stop words" programming assignment
  • Introduction to Apache Spark
  • Welcome
  • RDDs
  • Transformations 1
  • Transformations 2
  • Actions
  • Resiliency
  • Execution & Scheduling
  • Caching & Persistence
  • Broadcast variables
  • Accumulator variables
  • Getting started with Spark & Python
  • Working with text files
  • Joins
  • Broadcast & Accumulator variables
  • Spark UI
  • Cluster mode
  • Spark Assignments Intro
  • Instructions for Spark programming assignment
  • Lesson 1 Quiz
  • Lesson 2 Quiz
  • Introduction to Apache Spark (practice week)
  • Spark assignments Intro
  • Building an intuition behind the PMI definition
  • Real-World Applications
  • Sampling
  • Estimating proportions
  • Means
  • Medians
  • Map and Reduce Side Joins
  • Tabular Data, KeyFieldSelection
  • Data Skew, Salting
  • Twitter graph case study
  • Shortest path
  • Data and code
  • Starter for "Reconstructing the path" assignment
  • Sample estimates
  • Advanced MapReduce Techniques
  • Real-World Applications

Summary of User Reviews

Discover the essentials of big data in this comprehensive course on Coursera. Students have given high praise for the course, citing its informative and engaging content. One key aspect that many users appreciated was the practical examples and exercises that helped reinforce their understanding of the material.

Pros from User Reviews

  • Informative and engaging content
  • Practical examples and exercises
  • Great course for beginners
  • Excellent instructors
  • Good pace and structure

Cons from User Reviews

  • Some lectures can be a bit dry
  • Not enough hands-on coding
  • Some content can be repetitive
  • Could benefit from more advanced topics
  • Limited interaction with other students
English
Available now
Approx. 41 hours to complete
Ivan Puzyrevskiy, Emeli Dral , Evgeniy Riabenko, Alexey A. Dral, Pavel Mezentsev
Yandex
Coursera

Instructor

Ivan Puzyrevskiy

  • 4 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses