Machine Learning: Clustering & Retrieval

  • 4.6
Approx. 17 hours to complete

Course Summary

This course covers the fundamentals of clustering and retrieval, including nearest neighbor search, clustering methods, and dimensionality reduction techniques. You will also learn how to apply these techniques to real-world problems such as image and text retrieval.

Key Learning Points

  • Understand the basics of clustering and retrieval
  • Learn various clustering methods and dimensionality reduction techniques
  • Apply these techniques to real-world problems such as image and text retrieval

Job Positions & Salaries of people who have taken this course might have

  • Machine Learning Engineer
    • USA: $112,000
    • India: ₹1,064,000
    • Spain: €45,000
  • Data Scientist
    • USA: $117,000
    • India: ₹1,114,000
    • Spain: €47,000
  • Computer Vision Engineer
    • USA: $127,000
    • India: ₹1,208,000
    • Spain: €51,000

Related Topics for further study


Learning Outcomes

  • Understand the concepts and methods of clustering and retrieval
  • Apply clustering and retrieval techniques to real-world problems
  • Evaluate the effectiveness of clustering and retrieval methods

Prerequisites or good to have knowledge before taking this course

  • Basic knowledge of linear algebra and calculus
  • Familiarity with Python programming language

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Machine Learning: Clustering & Retrieval
  • Applied Machine Learning: Clustering & Retrieval

Related Education Paths


Notable People in This Field

  • Andrew Ng
  • Yann LeCun

Related Books

Description

Case Studies: Finding Similar Documents

Outline

  • Welcome
  • Welcome and introduction to clustering and retrieval tasks
  • Course overview
  • Module-by-module topics covered
  • Assumed background
  • Important Update regarding the Machine Learning Specialization
  • Slides presented in this module
  • Software tools you'll need for this course
  • A big week ahead!
  • Nearest Neighbor Search
  • Retrieval as k-nearest neighbor search
  • 1-NN algorithm
  • k-NN algorithm
  • Document representation
  • Distance metrics: Euclidean and scaled Euclidean
  • Writing (scaled) Euclidean distance using (weighted) inner products
  • Distance metrics: Cosine similarity
  • To normalize or not and other distance considerations
  • Complexity of brute force search
  • KD-tree representation
  • NN search with KD-trees
  • Complexity of NN search with KD-trees
  • Visualizing scaling behavior of KD-trees
  • Approximate k-NN search using KD-trees
  • Limitations of KD-trees
  • LSH as an alternative to KD-trees
  • Using random lines to partition points
  • Defining more bins
  • Searching neighboring bins
  • LSH in higher dimensions
  • (OPTIONAL) Improving efficiency through multiple tables
  • A brief recap
  • Slides presented in this module
  • Choosing features and metrics for nearest neighbor search
  • (OPTIONAL) A worked-out example for KD-trees
  • Implementing Locality Sensitive Hashing from scratch
  • Representations and metrics
  • Choosing features and metrics for nearest neighbor search
  • KD-trees
  • Locality Sensitive Hashing
  • Implementing Locality Sensitive Hashing from scratch
  • Clustering with k-means
  • The goal of clustering
  • An unsupervised task
  • Hope for unsupervised learning, and some challenge cases
  • The k-means algorithm
  • k-means as coordinate descent
  • Smart initialization via k-means++
  • Assessing the quality and choosing the number of clusters
  • Motivating MapReduce
  • The general MapReduce abstraction
  • MapReduce execution overview and combiners
  • MapReduce for k-means
  • Other applications of clustering
  • A brief recap
  • Slides presented in this module
  • Clustering text data with k-means
  • k-means
  • Clustering text data with K-means
  • MapReduce for k-means
  • Mixture Models
  • Motiving probabilistic clustering models
  • Aggregating over unknown classes in an image dataset
  • Univariate Gaussian distributions
  • Bivariate and multivariate Gaussians
  • Mixture of Gaussians
  • Interpreting the mixture of Gaussian terms
  • Scaling mixtures of Gaussians for document clustering
  • Computing soft assignments from known cluster parameters
  • (OPTIONAL) Responsibilities as Bayes' rule
  • Estimating cluster parameters from known cluster assignments
  • Estimating cluster parameters from soft assignments
  • EM iterates in equations and pictures
  • Convergence, initialization, and overfitting of EM
  • Relationship to k-means
  • A brief recap
  • Slides presented in this module
  • (OPTIONAL) A worked-out example for EM
  • Implementing EM for Gaussian mixtures
  • Clustering text data with Gaussian mixtures
  • EM for Gaussian mixtures
  • Implementing EM for Gaussian mixtures
  • Clustering text data with Gaussian mixtures
  • Mixed Membership Modeling via Latent Dirichlet Allocation
  • Mixed membership models for documents
  • An alternative document clustering model
  • Components of latent Dirichlet allocation model
  • Goal of LDA inference
  • The need for Bayesian inference
  • Gibbs sampling from 10,000 feet
  • A standard Gibbs sampler for LDA
  • What is collapsed Gibbs sampling?
  • A worked example for LDA: Initial setup
  • A worked example for LDA: Deriving the resampling distribution
  • Using the output of collapsed Gibbs sampling
  • A brief recap
  • Slides presented in this module
  • Modeling text topics with Latent Dirichlet Allocation
  • Latent Dirichlet Allocation
  • Learning LDA model via Gibbs sampling
  • Modeling text topics with Latent Dirichlet Allocation
  • Hierarchical Clustering & Closing Remarks
  • Module 1 recap
  • Module 2 recap
  • Module 3 recap
  • Module 4 recap
  • Why hierarchical clustering?
  • Divisive clustering
  • Agglomerative clustering
  • The dendrogram
  • Agglomerative clustering details
  • Hidden Markov models
  • What we didn't cover
  • Thank you!
  • Slides presented in this module
  • Modeling text data with a hierarchy of clusters
  • Modeling text data with a hierarchy of clusters

Summary of User Reviews

Discover Machine Learning techniques for clustering and retrieval with this comprehensive course on Coursera. Users have rated this course highly for its in-depth content and practical application. Learn how to cluster and retrieve data with ease.

Key Aspect Users Liked About This Course

The practical application of the course content is highly praised by users.

Pros from User Reviews

  • In-depth and comprehensive content
  • Practical application of concepts
  • Easy to follow
  • Great exercises to reinforce learning
  • Engaging and knowledgeable instructor

Cons from User Reviews

  • Some users found the course to be too technical
  • The pace of the course may be too fast for beginners
  • Not enough emphasis on real-world examples
  • Some users found the course to be too theoretical
  • Lack of interaction with other students
English
Available now
Approx. 17 hours to complete
Emily Fox, Carlos Guestrin
University of Washington
Coursera

Instructor

Emily Fox

  • 4.6 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses