Explore

Machine Learning: Clustering & Retrieval

Approx. 17 hours to complete

Save Course

Go to Course

Course Summary

This course covers the fundamentals of clustering and retrieval, including nearest neighbor search, clustering methods, and dimensionality reduction techniques. You will also learn how to apply these techniques to real-world problems such as image and text retrieval.

Key Learning Points

Understand the basics of clustering and retrieval
Learn various clustering methods and dimensionality reduction techniques
Apply these techniques to real-world problems such as image and text retrieval

Job Positions & Salaries of people who have taken this course might have

Machine Learning Engineer
- USA: $112,000
- India: ₹1,064,000
- Spain: €45,000
Data Scientist
- USA: $117,000
- India: ₹1,114,000
- Spain: €47,000
Computer Vision Engineer
- USA: $127,000
- India: ₹1,208,000
- Spain: €51,000

Learning Outcomes

Understand the concepts and methods of clustering and retrieval
Apply clustering and retrieval techniques to real-world problems
Evaluate the effectiveness of clustering and retrieval methods

Prerequisites or good to have knowledge before taking this course

Basic knowledge of linear algebra and calculus
Familiarity with Python programming language

Course Difficulty Level

Intermediate

Course Format

Online
Self-paced

Similar Courses

Machine Learning: Clustering & Retrieval
Applied Machine Learning: Clustering & Retrieval

Related Education Paths

Notable People in This Field

Andrew Ng
Yann LeCun

Related Books

Description

Case Studies: Finding Similar Documents

A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.

Outline

Welcome
Welcome and introduction to clustering and retrieval tasks
Course overview
Module-by-module topics covered
Assumed background
Important Update regarding the Machine Learning Specialization
Slides presented in this module
Software tools you'll need for this course
A big week ahead!

Nearest Neighbor Search
Retrieval as k-nearest neighbor search
1-NN algorithm
k-NN algorithm
Document representation
Distance metrics: Euclidean and scaled Euclidean
Writing (scaled) Euclidean distance using (weighted) inner products
Distance metrics: Cosine similarity
To normalize or not and other distance considerations
Complexity of brute force search
KD-tree representation
NN search with KD-trees
Complexity of NN search with KD-trees
Visualizing scaling behavior of KD-trees
Approximate k-NN search using KD-trees
Limitations of KD-trees
LSH as an alternative to KD-trees
Using random lines to partition points
Defining more bins
Searching neighboring bins
LSH in higher dimensions
(OPTIONAL) Improving efficiency through multiple tables
A brief recap
Slides presented in this module
Choosing features and metrics for nearest neighbor search
(OPTIONAL) A worked-out example for KD-trees
Implementing Locality Sensitive Hashing from scratch
Representations and metrics
Choosing features and metrics for nearest neighbor search
KD-trees
Locality Sensitive Hashing
Implementing Locality Sensitive Hashing from scratch

Clustering with k-means
The goal of clustering
An unsupervised task
Hope for unsupervised learning, and some challenge cases
The k-means algorithm
k-means as coordinate descent
Smart initialization via k-means++
Assessing the quality and choosing the number of clusters
Motivating MapReduce
The general MapReduce abstraction
MapReduce execution overview and combiners
MapReduce for k-means
Other applications of clustering
A brief recap
Slides presented in this module
Clustering text data with k-means
k-means
Clustering text data with K-means
MapReduce for k-means

Mixture Models
Motiving probabilistic clustering models
Aggregating over unknown classes in an image dataset
Univariate Gaussian distributions
Bivariate and multivariate Gaussians
Mixture of Gaussians
Interpreting the mixture of Gaussian terms
Scaling mixtures of Gaussians for document clustering
Computing soft assignments from known cluster parameters
(OPTIONAL) Responsibilities as Bayes' rule
Estimating cluster parameters from known cluster assignments
Estimating cluster parameters from soft assignments
EM iterates in equations and pictures
Convergence, initialization, and overfitting of EM
Relationship to k-means
A brief recap
Slides presented in this module
(OPTIONAL) A worked-out example for EM
Implementing EM for Gaussian mixtures
Clustering text data with Gaussian mixtures
EM for Gaussian mixtures
Implementing EM for Gaussian mixtures
Clustering text data with Gaussian mixtures

Mixed Membership Modeling via Latent Dirichlet Allocation
Mixed membership models for documents
An alternative document clustering model
Components of latent Dirichlet allocation model
Goal of LDA inference
The need for Bayesian inference
Gibbs sampling from 10,000 feet
A standard Gibbs sampler for LDA
What is collapsed Gibbs sampling?
A worked example for LDA: Initial setup
A worked example for LDA: Deriving the resampling distribution
Using the output of collapsed Gibbs sampling
A brief recap
Slides presented in this module
Modeling text topics with Latent Dirichlet Allocation
Latent Dirichlet Allocation
Learning LDA model via Gibbs sampling
Modeling text topics with Latent Dirichlet Allocation

Hierarchical Clustering & Closing Remarks
Module 1 recap
Module 2 recap
Module 3 recap
Module 4 recap
Why hierarchical clustering?
Divisive clustering
Agglomerative clustering
The dendrogram
Agglomerative clustering details
Hidden Markov models
What we didn't cover
Thank you!
Slides presented in this module
Modeling text data with a hierarchy of clusters
Modeling text data with a hierarchy of clusters

Summary of User Reviews

Discover Machine Learning techniques for clustering and retrieval with this comprehensive course on Coursera. Users have rated this course highly for its in-depth content and practical application. Learn how to cluster and retrieve data with ease.

Key Aspect Users Liked About This Course

The practical application of the course content is highly praised by users.

Pros from User Reviews

In-depth and comprehensive content
Practical application of concepts
Easy to follow
Great exercises to reinforce learning
Engaging and knowledgeable instructor

Cons from User Reviews

Some users found the course to be too technical
The pace of the course may be too fast for beginners
Not enough emphasis on real-world examples
Some users found the course to be too theoretical
Lack of interaction with other students

Recommended for you

The Complete Machine Learning Course with Python

Build a Portfolio of 12 Machine Learning Projects with Python, SVM, Regression, Unsupervised Machine Learning & More! Brand new sections include:...

Save Course

Mastering Data Science and Machine Learning Fundamentals

A Beginner Course in Data Science, Machine Learning, Regression, Classification and Clustering in 2020 [THEORY ONLY] A comprehensive course that will teach you how Data Science and Machine Learning Work....

Save Course

Machine Learning- Step by Step from basic to advanced level.

A beginners guide to learn Machine Learning including Hands on from scratch. If you are looking to start your career in machine learning then this is the course for you....

Save Course