Scalable Machine Learning on Big Data using Apache Spark

  • 3.8
Approx. 7 hours to complete

Course Summary

This course teaches learners how to use Apache Spark and machine learning techniques to analyze big data. Students will learn how to implement various algorithms for classification, clustering, and regression using Spark's MLlib library.

Key Learning Points

  • Gain practical experience in analyzing big data using Apache Spark
  • Learn how to implement machine learning algorithms for classification, clustering and regression
  • Understand how to use Spark's MLlib library to build scalable machine learning models

Job Positions & Salaries of people who have taken this course might have

    • USA: $113,000
    • India: ₹1,103,000
    • Spain: €44,000
    • USA: $113,000
    • India: ₹1,103,000
    • Spain: €44,000

    • USA: $129,000
    • India: ₹1,200,000
    • Spain: €55,000
    • USA: $113,000
    • India: ₹1,103,000
    • Spain: €44,000

    • USA: $129,000
    • India: ₹1,200,000
    • Spain: €55,000

    • USA: $141,000
    • India: ₹1,500,000
    • Spain: €65,000

Related Topics for further study


Learning Outcomes

  • Develop practical skills in big data analysis using Apache Spark
  • Learn how to implement various machine learning algorithms for classification, clustering and regression
  • Understand how to build scalable machine learning models using Spark's MLlib library

Prerequisites or good to have knowledge before taking this course

  • Basic understanding of programming concepts
  • Familiarity with data analysis and statistics

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Big Data Essentials: HDFS, MapReduce and Spark RDD
  • Advanced Machine Learning and Signal Processing

Related Education Paths


Notable People in This Field

  • Creator of Apache Spark
  • Co-founder of Coursera

Related Books

Description

This course will empower you with the skills to scale data science and machine learning (ML) tasks on Big Data sets using Apache Spark. Most real world machine learning work involves very large data sets that go beyond the CPU, memory and storage limitations of a single computer.

Outline

  • Week 1: Introduction
  • Introduction to Apache Spark for Machine Learning on BigData
  • What is Big Data?
  • Data storage solutions
  • Parallel data processing strategies of Apache Spark
  • Functional programming basics
  • Resilient Distributed Dataset and DataFrames - ApacheSparkSQL
  • Course Syllabus
  • Setup of the grading and exercise environment
  • Exercise 1 - working with RDD
  • Exercise 2 - functional programming basics with RDDs
  • Exercise 3 - working with DataFrames
  • Programming Lanuage Options for Apache Spark (optional)
  • Practice Quiz (Ungraded) - Apache Spark concepts
  • Week 2: Scaling Math for Statistics on Apache Spark
  • Averages
  • Standard deviation
  • Skewness
  • Kurtosis
  • Covariance, Covariance matrices, correlation
  • Plotting with ApacheSpark and python's matplotlib
  • Dimensionality reduction
  • PCA
  • Exercise 1 - statistics and transfomrations using DataFrames
  • Exercise on Plotting
  • Exercise on PCA
  • Practice Quiz (Ungraded) - Statistics and API usage on Spark
  • Week 3: Introduction to Apache SparkML
  • How ML Pipelines work
  • Introduction to SparkML
  • Extract - Transform - Load
  • Introduction to Clustering: k-Means
  • Using K-Means in Apache SparkML
  • Exercise 1: Modifying a Apache SparkML Feature Engineering Pipeline
  • Exercise 2 - Working with Clustering and Apache SparkML
  • Practice Quiz (Ungraded) - ML Pipelines
  • Week 4: Supervised and Unsupervised learning with SparkML
  • Linear Regression
  • LinearRegression with Apache SparkML
  • Logistic Regression
  • LogisticRegression with Apache SparkML
  • Exercise 1 - Improving Classification performance
  • Course Project
  • Practice Quiz (Ungraded) - SparkML Algorithms (2)

Summary of User Reviews

This course on machine learning and big data using Apache Spark received high praise from many users. A key aspect that many users found good was the practical approach taken by the instructor, making it easy to apply the concepts to real-world scenarios.

Pros from User Reviews

  • Practical approach to learning concepts
  • Easy to apply concepts to real-world scenarios
  • Instructor is knowledgeable and engaging
  • Course content is well-structured
  • Challenging assignments that push you to learn more

Cons from User Reviews

  • Course can be difficult for beginners
  • Some users found the pace of the course to be too fast
  • Course could benefit from more detailed explanations
  • Lack of interaction with other students
  • Not enough emphasis on certain topics
English
Available now
Approx. 7 hours to complete
Romeo Kienzler
IBM
Coursera

Instructor

Romeo Kienzler

  • 3.8 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses