Explore

Principles, Statistical and Computational Tools for Reproducible Data Science

8 weeks long

Save Course

Go to Course

Brief Introduction

Learn skills and tools that support data science and reproducible research, to ensure you can trust your own research results, reproduce them yourself, and communicate them to others.

Course Summary

Learn the principles of statistical and computational tools for reproducible data science in this Harvard course. Gain hands-on experience in data manipulation, visualization, and analysis using R programming language and version control with Git and GitHub.

Key Learning Points

Understand the importance of reproducibility in data science
Learn R programming language for data manipulation, visualization, and analysis
Gain experience in version control with Git and GitHub

Learning Outcomes

Ability to apply statistical and computational tools for reproducible data science
Proficiency in R programming language for data manipulation, visualization, and analysis
Experience in version control with Git and GitHub

Prerequisites or good to have knowledge before taking this course

Familiarity with basic statistical concepts
Basic knowledge of programming

Course Difficulty Level

Intermediate

Course Format

Online
Self-paced

Similar Courses

Data Science Essentials
Data Science: Machine Learning
Data Science: Probability

Related Education Paths

Notable People in This Field

Hadley Wickham
Karl Broman

Related Books

Description

Course description

Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any intensive data research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

To meet the needs of the scientific community, this course will examine the fundamentals of methods and tools for reproducible research. Led by experienced faculty from the Harvard T.H. Chan School of Public Health, you will participate in six modules that will include several case studies that illustrate the significant impact of reproducible research methods on scientific discovery.

This course will appeal to students and professionals in biostatistics, computational biology, bioinformatics, and data science. The course content will blend video lectures, case studies, peer-to-peer engagements and use of computational tools and platforms (such as R/RStudio, and Git/Github), culminating in a final presentation of a final reproducible research project.

We’ll cover Fundamentals of Reproducible Science; Case Studies; Data Provenance; Statistical Methods for Reproducible Science; Computational Tools for Reproducible Science; and Reproducible Reporting Science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.

Consider this course a survey of best practices: we’d like to make you aware of pitfalls in reproducible data science, some failure - and success - stories in the past, and tools and design patterns that might help make it all easier. But ultimately it’ll be up to you to take the skills you learn from this course to create your own environment in which you can easily carry out reproducible research and to encourage and integrate with similar environments for your collaborators and colleagues. We look forward to seeing you in this course and the research you do in the future!

Knowledge

What you'll learn
Understand a series of concepts, thought patterns, analysis paradigms, and computational and statistical tools, that together support data science and reproducible research.
Fundamentals of reproducible science using case studies that illustrate various practices.
Key elements for ensuring data provenance and reproducible experimental design.
Statistical methods for reproducible data analysis.
Computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse) and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.
How to develop new methods and tools for reproducible research and reporting, and how to write your own reproducible paper.

Summary of User Reviews

Discover the Principles of Statistical and Computational Tools for Reproducible Data Science in this Harvard online course. Users have rated it highly for its practicality and relevance. Learn from the best and gain valuable skills to apply to your work and research.

Key Aspect Users Liked About This Course

Many users appreciated the practicality and relevance of the course content.

Pros from User Reviews

Course content is well-organized and easy to follow.
Instructors are knowledgeable and engaging.
Great opportunity to learn from and collaborate with peers.
Practical skills learned can be applied to real-world projects.
Course materials are comprehensive and thorough.

Cons from User Reviews

Some users found the course to be too basic or introductory.
The course may not be suitable for those without a strong background in statistics or programming.
Some users found the assignments to be too time-consuming.
The pace of the course may be too slow for some learners.
Some users felt that the course lacked depth in certain areas.

priceFree*

languageEnglish

start date16th Sep, 2020

end date15th Sep, 2021

duration8 weeks long

teacherCurtis Huttenhower, John Quackenbush, Lorenzo Trippa, Christine Choirat

institutionHarvard University, Harvard Faculty of Arts & Sciences

course taken onHarvard University

Recommended for you