Foundations of Data Science: K-Means Clustering in Python

  • 4.6
Approx. 29 hours to complete

Course Summary

This course is designed to teach students how to use k-means clustering in Python for data science applications. Students will learn how to implement k-means clustering algorithms, evaluate their results, and apply them to real-world problems.

Key Learning Points

  • Learn how to use k-means clustering in Python for data science applications
  • Implement k-means clustering algorithms
  • Evaluate the results of k-means clustering and apply it to real-world problems

Related Topics for further study


Learning Outcomes

  • Implement k-means clustering algorithms using Python
  • Evaluate the effectiveness of k-means clustering
  • Apply k-means clustering to real-world problems

Prerequisites or good to have knowledge before taking this course

  • Basic understanding of Python programming
  • Familiarity with data science concepts

Course Difficulty Level

Intermediate

Course Format

  • Self-paced
  • Online
  • Video lectures
  • Hands-on exercises

Similar Courses

  • Machine Learning with Python
  • Applied Data Science with Python
  • Data Mining

Related Education Paths


Related Books

Description

Organisations all around the world are using data to predict behaviours and extract valuable real-world insights to inform decisions. Managing and analysing big data has become an essential part of modern finance, retail, marketing, social science, development and research, medicine and government.

Knowledge

  • Define and explain the key concepts of data clustering
  • Demonstrate understanding of the key constructs and features of the Python language.
  • Implement in Python the principle steps of the K-means algorithm.
  • Design and execute a whole data clustering workflow and interpret the outputs.

Outline

  • Week 1: Foundations of Data Science: K-Means Clustering in Python
  • Welcome and Introduction
  • Introduction to Data Science
  • What is Data?
  • Types of Data
  • Machine Learning
  • Supervised vs Unsupervised Learning
  • K-Means Clustering
  • Preparing your Data
  • A Real World Dataset
  • Types of Data – Review Information
  • Supervised vs Unsupervised – Review Information
  • K-Means Clustering – Review Information
  • Week 1 Summative Assessment
  • Week 2: Means and Deviations in Mathematics and Python
  • 2.0: Week 2 Introduction
  • 2.1 – Introduction to Mathematical Concepts of Data Clustering
  • 2.2 – Mean of One Dimensional Lists
  • 2.3 – Variance and Standard Deviation
  • 2.4 Jupyter Notebooks
  • 2.5 Variables
  • 2.6 Lists
  • 2.7 Computing the Mean
  • 2.8 Better Lists: NumPy
  • 2.9 Computing the Standard Deviation
  • Week 2 Conclusion
  • Population vs Sample, Bias
  • Variability, Standard Deviation and Bias
  • Python Style Guide
  • Numpy and Array Creation
  • Population vs Sample – Review Information
  • Mean of One Dimensional Lists – Review Information
  • Variance and Standard Deviation – Review Information
  • Jupyter Notebooks – Review Information
  • Variables – Review Information
  • Lists – Review Information
  • Computing the Mean – Review Information
  • Better Lists – Review Information
  • Computing the Standard Deviation – Review Information
  • Week 2 Summative Assessment
  • Week 3: Moving from One to Two Dimensional Data
  • Week 3 Introduction
  • 3.1 Multidimensional Data Points and Features
  • 3.2 Multidimensional Mean
  • 3.3 Dispersion: Multidimensional Variables
  • 3.4 Distance Metrics
  • 3.5 Normalisation
  • 3.6 Outliers
  • 3.7 Basic Plotting
  • 3.7a Storing 2D Coordinates in a Single Data Structure
  • 3.8 Multidimensional Mean
  • 3.9 Adding Graphical Overlays
  • 3.10 Calculating the Distance to the Mean
  • 3.11 List Comprehension
  • 3.12 Normalisation in Python
  • 3.13 Outliers and Plotting Normalised Data
  • Week 3 Conclusion
  • Multidimensional Data Points and Features Recap
  • Multidimensional Mean Recap
  • Multidimensional Variables Recap
  • Distance Metrics Recap
  • Normalisation Recap
  • Note on Matplotlib
  • Matplotlib Scatter Plot Documentation
  • Matplotlib Patches Documentation
  • List Comprehension Documentation
  • 3.12 Errata
  • Multidimensional Data Points and Features – Review Information
  • Multidimensional Mean – Review Information
  • Dispersion: Multidimensional Variables – Review Information
  • Distance Metrics – Review Information
  • Normalisation – Review Information
  • Outliers – Review Information
  • Basic Plotting – Review Information
  • Storing 2D Coordinates – Review Information
  • Multidimensional Mean – Review Information
  • Adding Graphical Overlays – Review Information
  • Calculating Distance – Review Information
  • List Comprehension – Review Information
  • Normalisation in Python – Review Information
  • Outliers – Review Information
  • Week 3 Summative Assessment
  • Week 4: Introducing Pandas and Using K-Means to Analyse Data
  • Week 4 Introduction
  • 4.1: Using the Pandas Library to Read csv Files
  • 4.1a: Sorting and Filtering Data Using Pandas
  • 4.1b: Labelling Points on a Graph
  • 4.1c: Labelling all the Points on a Graph
  • 4.2: Eyeballing the Data
  • 4.3: Using K-Means to Interpret the Data
  • Week 4: Conclusion
  • Week 4 Code Resources
  • Pandas Read_CSV Function
  • More Pandas Library Documentation
  • The Pyplot Text Function
  • For Loops in Python
  • Documentation for sklearn.cluster.KMeans
  • Using the Pandas Library to Read csv Files – Review Information
  • Sorting and Filtering Data Using Pandas – Review Information
  • Labelling Points on a Graph – Review Information
  • Labelling all the Points on a Graph – Review Information
  • Eyeballing the Data – Review Information
  • Using K-Means to Interpret the Data – Review Information
  • Week 4 Summative Assessment
  • Week 5: A Data Clustering Project
  • Introduction to Week 5
  • 5.1 Can a Machine Detect Fake Notes?
  • 5.2 Working for a Client
  • 5.3 How to Organize Work on Your Project
  • 5.4 Dealing With Difficulties
  • 5.5 No Data no Data Science: Introduction of the Dataset
  • 5.6 Modelling
  • 5.7 Presenting the Project Results
  • 5.8 Concluding Remarks
  • Week 5 Code Resource – the Dataset for our Project
  • Saving plt.scatter Outputs as Figures
  • Additional Recommended Reading for Week 5
  • How Would You Help? – Review Information
  • Python – Review Information
  • Week 5 Summative Assessment

Summary of User Reviews

Discover the power of k-means clustering in data science with this comprehensive course on Coursera. Gain practical skills in Python and learn how to implement k-means clustering algorithms for real-world applications. Highly recommended for anyone looking to advance their knowledge in data science.

Key Aspect Users Liked About This Course

Many users found the course to be well-structured and easy to follow.

Pros from User Reviews

  • Clear and concise explanations of k-means clustering concepts.
  • Hands-on exercises and projects to practice implementing k-means clustering algorithms.
  • The course provides a good balance between theory and practical applications.
  • Instructors are knowledgeable and responsive to questions.
  • Excellent resource for beginners and intermediate learners in data science.

Cons from User Reviews

  • Some users found the pace of the course to be too slow.
  • The course could benefit from more advanced topics and applications of k-means clustering.
  • No certification is offered upon completion of the course.
  • Some users experienced technical difficulties with the online platform.
  • The course may not be suitable for users looking for a deep dive into the mathematical underpinnings of k-means clustering.
English
Available now
Approx. 29 hours to complete
Dr Matthew Yee-King, Dr Betty Fyn-Sydney, Dr Jamie A Ward, Dr Larisa Soldatova
University of London, Goldsmiths, University of London
Coursera

Instructor

Dr Matthew Yee-King

  • 4.6 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses