Big Data Analysis with Scala and Spark

  • 4.7
Approx. 28 hours to complete

Course Summary

This course teaches Scala programming and Spark architecture for big data processing, with hands-on experience in building and deploying Spark applications.

Key Learning Points

  • Learn Scala programming language and become proficient in Spark architecture
  • Develop Spark applications and deploy them in a cluster
  • Gain hands-on experience in working with big data processing

Job Positions & Salaries of people who have taken this course might have

    • USA: $110,000
    • India: ₹13,00,000
    • Spain: €45,000
    • USA: $110,000
    • India: ₹13,00,000
    • Spain: €45,000

    • USA: $100,000
    • India: ₹10,00,000
    • Spain: €40,000
    • USA: $110,000
    • India: ₹13,00,000
    • Spain: €45,000

    • USA: $100,000
    • India: ₹10,00,000
    • Spain: €40,000

    • USA: $120,000
    • India: ₹15,00,000
    • Spain: €50,000

Related Topics for further study


Learning Outcomes

  • Proficiency in Scala programming language
  • Ability to develop and deploy Spark applications
  • Hands-on experience in working with big data processing

Prerequisites or good to have knowledge before taking this course

  • Basic programming knowledge
  • Familiarity with Linux command line

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Big Data Analytics with Apache Spark
  • Big Data and Hadoop Essentials
  • Scala Programming for Data Science

Notable People in This Field

  • Chief Architect at Cloudera
  • CEO at Confluent
  • Senior Fellow at Google

Related Books

Description

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Outline

  • Getting Started + Spark Basics
  • Introduction, Logistics, What You'll Learn
  • Data-Parallel to Distributed Data-Parallel
  • Latency
  • RDDs, Spark's Distributed Collection
  • RDDs: Transformation and Actions
  • Evaluation in Spark: Unlike Scala Collections!
  • Cluster Topology Matters!
  • Tools setup
  • Sbt tutorial
  • Intellij IDEA Tutorial
  • Eclipse tutorial
  • Submitting solutions
  • Reduction Operations & Distributed Key-Value Pairs
  • Reduction Operations
  • Pair RDDs
  • Transformations and Actions on Pair RDDs
  • Joins
  • Partitioning and Shuffling
  • Shuffling: What it is and why it's important
  • Partitioning
  • Optimizing with Partitioners
  • Wide vs Narrow Dependencies
  • Structured data: SQL, Dataframes, and Datasets
  • Structured vs Unstructured Data
  • Spark SQL
  • DataFrames (1)
  • DataFrames (2)
  • Datasets

Summary of User Reviews

Discover the world of big data with Scala and Spark. This course covers the fundamentals of big data, Spark, and the Scala programming language. Users love the hands-on approach to learning and the practical examples provided throughout the course.

Key Aspect Users Liked About This Course

Hands-on approach to learning

Pros from User Reviews

  • Practical examples provided throughout the course
  • Great introduction to Scala and Spark
  • Experienced instructors with real-world experience
  • Flexible deadlines allow learners to go at their own pace
  • Challenging assignments that prepare learners for real-world applications

Cons from User Reviews

  • Some learners may find the pace too fast
  • Requires prior knowledge of programming concepts
  • Not suitable for complete beginners
  • Some learners may prefer more theoretical explanations
  • Lacks in-depth coverage of certain topics
English
Available now
Approx. 28 hours to complete
Prof. Heather Miller
École Polytechnique Fédérale de Lausanne
Coursera

Instructor

Prof. Heather Miller

  • 4.7 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses