Explore

Login | Signup

Delta Lake with Apache Spark using Scala

2.4

2 hours on-demand video

$ 12.99

Go to Course

Brief Introduction

Delta Lake with Apache Spark using Scala on Databricks platform

Description

You will Learn Delta Lake with Apache Spark using Scala on DataBricks Platform

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Scala!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 3.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Topics Included in the Courses

Introduction to Delta Lake
Introduction to Data Lake
Key Features of Delta Lake
Introduction to Spark
Free Account creation in Databricks
Provisioning a Spark Cluster
Basics about notebooks
Dataframes
Create a table
Write a table
Read a table
Schema validation
Update table schema
Table Metadata
Delete from a table
Update a Table
Vacuum
History
Concurrency Control
Optimistic concurrency control
Migrate Workloads to Delta Lake
Optimize Performance with File Management
Auto Optimize
Optimize Performance with Caching
Delta and Apache Spark caching
Cache a subset of the data
Isolation Levels
Best Practices
Frequently Asked Question in Interview

About Databricks:

Databricks lets you start writing Spark code instantly so you can focus on your data problems.

Requirements

Requirements
Apache Spark and Scala and SQL basic knowledge is necessary for this course

Recommended for you

Recommended for you

see more

Delta Lake with Apache Spark using Scala

2.4

Delta Lake with Apache Spark using Scala on Databricks platform You will Learn Delta Lake with Apache Spark using Scala on DataBricks Platform...

Apache Spark (TM) SQL for Data Analysts

4.6

Apache Spark is one of the most widely used technologies in big data analytics. Welcome to Apache Spark SQL for Data Analysts...

Distributed Computing with Spark SQL

4.5

This course is all about big data. Students will gain a thorough understanding of this open-source standard for working with large datasets....

Cancel

Add new list

0/60

0/360

Cancel

Delete list

Are you sure you want to delete this list? This action will cause all your notes inside the list to be lost

Cancel Delete

Delete Course

Are you sure you want to delete this course? This action will cause all your notes inside the course to be lost

Cancel Delete

Share

Share with link

Cancel Copy

Saved Course list

All list

Cancel

Get Course Update

Computer Courses

Sign in with Google

or

By creating an email alert, you agree to Coursary's Terms of Service and Privacy Policy. You can pause or unsubscribe from email alerts at any time.