What courses do you recommend for me to start learning about Data Science and Machine Learning?

I get asked this question on a monthly basis, if not weekly. So I decided to put together this brief post to help address it. I will introduce three resources, provide an overview of each and then will include a link to their respective websites.

Not long ago, I went to a Data Scientist friend of mine with the same question and guided me to these three resources. Now I am going to pass the baton on to you and I hope you do the same in the future, if you find them helpful. They helped me land my first Data Science job in Amazon and I hope they can also help you get the role that you want.

There is also an added benefit — All three are either free to enroll or course materials are free to use.

I wanted to keep this post focused on these three resources so I did not go into breaking down what technical knowledge is required for a Data Scientist in Amazon, but if you are looking for that, feel free to check my other post linked here, which also includes practice questions (with Jupyter notebooks) to practice what you learn.

Lastly, I recommend going through these three resources in the order that I have listed them here, since they progressively increase in depth.

Let’s get started!

1. Andrew Ng’s Machine Learning Course

I know what you are thinking. Everyone knows this one! With around 2.4 million views as of October 2022, I am sure most of us know this one but it is still one of the best courses for anyone to start learning about Machine Learning and includes practice quizzes to test one’s learning.

1.1. Syllabus

Andrew Ng, who really does not need an introduction, focuses on what matters most for those getting started with Machine Learning and breaks down the course into the following:

Introduction to Machine Learning
Regression with Multiple Input Variables
Classification

Pro Tip: When I took this course a few years ago, practice problems did not have solutions in Python, which was and remains my code of choice. Things might have changed by now but if that is still the case and you also happen to prefer Python, a simple Google search will help you find the solutions in Python.

1.2. Link to the Course

Free enrollment is available on Coursera.

2. John Paisley’s Machine Learning Course

John Paisley is an Assistant Professor in the Department of Electrical Engineering at Columbia University. He is also an affiliated member of the Data Science Institute at Columbia.

This one goes deeper than Andrew Ng’s course and explains more about the distinction between probabilistic and non-probabilistic modeling and also supervised versus unsupervised learning.

Pro Tip: This one is great for understanding the concepts but can sometimes go deep into the mathematical side of things. If you follow the math, that is great but if you do not (like me), focus more on the conceptual parts and you can safely skip the math-heavy portions.

Let’s look at the syllabus for details.

2.1. Syllabus

Maximum likelihood estimation, linear regression, least squares
Ridge regression, bias-variance, Bayes rule, maximum a posteriori inference
Bayesian linear regression, sparsity, subset selection for linear regression
Nearest neighbor classification, Bayes classifiers, linear classifiers, perceptron
Logistic regression, Laplace approximation, kernel methods, Gaussian processes
Maximum margin, support vector machines, trees, random forests, boosting
Clustering, k-means, Expectation Maximization (EM) algorithm, missing data
Mixtures of Gaussians, matrix factorization
Non-negative matrix factorization, latent factor models, Principal Component Analysis (PCA) and variations
Markov models, hidden Markov models
Continuous state-space models, association analysis
Model selection

2.2. Link to the Course

Course materials are available for free on edX.

3. Andreas Muller’s Applied Machine Learning

Andreas Muller is an Associate Research Scientist in Data Science Institute at Columbia.

His course materials are available online and he also goes deeper into some Natural Language Processing (NLP) topics (i.e. textual data) such as topic modeling, embeddings, etc. So if you are interested in learning more about NLP, you will enjoy those topics here.

Let’s look at the syllabus for details.

3.1. Syllabus

Matplotlib and Visualization
Supervised learning
Preprocessing
Linear models for regression
Linear models for classification
Trees, forests and ensembles
Gradient descent and gradient boosting
Model evaluation
Calibration and imbalanced data
Parameter tuning and automatic machine learning
Dimensionality reduction
Clustering and mixture models
Working with text data
Topic models for text data
Word and document embeddings
Neural networks
Keras and convolutional neural networks
Time series

3.2. Link to the Course

Pro Tip: Make sure to check out his YouTube videos, as well as the course materials, which are both available on the course website.

3 Free Courses that Helped Me Land My First Data Scientist Job in Amazon