Hadoop Platform and Application Framework

  • 4
Approx. 26 hours to complete

Course Summary

Learn the basics of Hadoop and how to use it for big data processing and analysis. This course covers HDFS, MapReduce, and other key components of the Hadoop ecosystem.

Key Learning Points

  • Gain practical experience with Hadoop using real-world examples
  • Learn how to scale and optimize Hadoop clusters for better performance
  • Understand how Hadoop fits into the larger big data ecosystem

Job Positions & Salaries of people who have taken this course might have

    • USA: $110,000
    • India: ₹1,000,000
    • Spain: €35,000
    • USA: $110,000
    • India: ₹1,000,000
    • Spain: €35,000

    • USA: $70,000
    • India: ₹600,000
    • Spain: €25,000
    • USA: $110,000
    • India: ₹1,000,000
    • Spain: €35,000

    • USA: $70,000
    • India: ₹600,000
    • Spain: €25,000

    • USA: $95,000
    • India: ₹900,000
    • Spain: €30,000

Related Topics for further study


Learning Outcomes

  • Understand the fundamentals of Hadoop and how it fits into the big data landscape
  • Gain practical experience with Hadoop and learn how to use it for real-world scenarios
  • Learn how to optimize and scale Hadoop clusters for better performance

Prerequisites or good to have knowledge before taking this course

  • Basic programming knowledge in Java or Python
  • Familiarity with Linux command-line

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Apache Spark and Scala
  • Big Data Essentials: HDFS, MapReduce and Spark RDD
  • Introduction to Big Data

Related Education Paths


Notable People in This Field

  • Creator of Hadoop
  • Author of 'Hadoop: The Definitive Guide'

Related Books

Description

This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.

Outline

  • Hadoop Basics
  • Hadoop Stack Basics
  • The Apache Framework: Basic Modules
  • Hadoop Distributed File System (HDFS)
  • The Hadoop "Zoo"
  • Hadoop Ecosystem Major Components
  • Exploring the Cloudera VM: Hands-On Part 1
  • Exploring the Cloudera VM: Hands-On Part 2
  • Apache Hadoop Ecosystem
  • Lesson 1 Slides (PDF)
  • Hardware & Software Requirements
  • Lesson 2 Slides - Cloudera VM Tour
  • Basic Hadoop Stack
  • Introduction to the Hadoop Stack
  • Overview of the Hadoop Stack
  • The Hadoop Distributed File System (HDFS) and HDFS2
  • MapReduce Framework and YARN
  • The Hadoop Execution Environment
  • YARN, Tez, and Spark
  • Hadoop Resource Scheduling
  • Hadoop-Based Applications
  • Introduction to Apache Pig
  • Introduction to Apache HIVE
  • Introduction to Apache HBASE
  • Hadoop Basics - Lesson 1 Slides
  • Lesson 2: Hadoop Execution Environment - Slides
  • Lesson 3: Hadoop-Based Applications Overview - All Slides
  • Command list for Applications Slides
  • Tips to handle service connection errors
  • References for Applications
  • Overview of Hadoop Stack
  • Hadoop Execution Environment
  • Hadoop Applications
  • Introduction to Hadoop Distributed File System (HDFS)
  • Overview of HDFS Architecture
  • The HDFS Performance Envelope
  • Read/Write Processes in HDFS
  • HDFS Tuning Parameters
  • HDFS Performance and Robustness
  • Overview of HDFS Access, APIs, and Applications
  • HDFS Commands
  • Native Java API for HDFS
  • REST API for HDFS
  • Lesson 1: Introduction to HDFS - Slides
  • HDFS references
  • Lesson 2: HDFS Performance and Tuning - Slides
  • HDFS Access, APIs
  • Lesson 3: HDFS Access, APIs, Applications - Slides
  • HDFS Architecture
  • HDFS performance,tuning, and robustness
  • Accessing HDFS
  • Introduction to Map/Reduce
  • Introduction to Map/Reduce
  • The Map/Reduce Framework
  • A MapReduce Example: Wordcount in detail
  • MapReduce: Intro to Examples and Principles
  • MapReduce Example: Trending Wordcount
  • MapReduce Example: Joining Data
  • MapReduce Example: Vector Multiplication
  • Computational Costs of Vector Multiplication
  • MapReduce Summary
  • Lesson 1: Introduction to MapReduce - Slides
  • A note on debugging map/reduce programs.
  • Lesson 2: MapReduce Examples and Principles - Slides
  • Lesson 1 Review
  • Spark
  • Introduction to Apache Spark
  • Architecture of Spark
  • Resilient Distributed Datasets
  • Spark Transformations
  • Wide Transformations
  • Directed Acyclic Graph (DAG) Scheduler
  • Actions in Spark
  • Memory Caching in Spark
  • Broadcast Variables
  • Accumulators
  • Setup PySpark on the Cloudera VM
  • Lesson 1: Intro to Apache Spark - Slides
  • Lesson 2: RDD and Transformations - Slides
  • Lesson 3: Scheduling, Actions, Caching - Slides
  • Spark Lesson 1
  • Spark Lesson 2
  • Spark Lesson 3

Summary of User Reviews

Discover the world of Hadoop with this comprehensive course on Coursera. Learn the basics of Hadoop and its ecosystem, and gain hands-on experience with real-world applications. Students rate this course highly and appreciate its practical approach.

Key Aspect Users Liked About This Course

Many users appreciate the practical approach of this course, which provides hands-on experience with real-world applications.

Pros from User Reviews

  • The course provides a comprehensive overview of Hadoop and its ecosystem
  • The hands-on exercises are practical and applicable to real-world scenarios
  • The instructors are knowledgeable and provide clear explanations
  • The course is well-structured and easy to follow
  • The course materials are high-quality and informative

Cons from User Reviews

  • The course can be challenging for beginners without prior programming experience
  • The course may feel slow-paced for students with more advanced knowledge of Hadoop
  • Some students have reported technical issues with the course platform
  • The course does not cover more advanced topics in depth
  • The course may require additional resources or further study for students seeking more advanced knowledge
English
Available now
Approx. 26 hours to complete
Natasha Balac, Ph.D., Paul Rodriguez, Andrea Zonca
University of California San Diego
Coursera

Instructor

Share
Saved Course list
Cancel
Get Course Update
Computer Courses