What courses do you recommend for me to start learning about Data Science and Machine Learning?
I get asked this question on a monthly basis, if not weekly. So I decided to put together this brief post to help address it. I will introduce three resources, provide an overview of each and then will include a link to their respective websites.
Not long ago, I went to a Data Scientist friend of mine with the same question and guided me to these three resources. Now I am going to pass the baton on to you and I hope you do the same in the future, if you find them helpful. They helped me land my first Data Science job in Amazon and I hope they can also help you get the role that you want.
There is also an added benefit — All three are either free to enroll or course materials are free to use.
I wanted to keep this post focused on these three resources so I did not go into breaking down what technical knowledge is required for a Data Scientist in Amazon, but if you are looking for that, feel free to check my other post linked here, which also includes practice questions (with Jupyter notebooks) to practice what you learn.
Lastly, I recommend going through these three resources in the order that I have listed them here, since they progressively increase in depth.
Let’s get started!
1. Andrew Ng’s Machine Learning Course
I know what you are thinking. Everyone knows this one! With around 2.4 million views as of October 2022, I am sure most of us know this one but it is still one of the best courses for anyone to start learning about Machine Learning and includes practice quizzes to test one’s learning.
1.1. Syllabus
Andrew Ng, who really does not need an introduction, focuses on what matters most for those getting started with Machine Learning and breaks down the course into the following:
- Introduction to Machine Learning
- Regression with Multiple Input Variables
- Classification
Pro Tip: When I took this course a few years ago, practice problems did not have solutions in Python, which was and remains my code of choice. Things might have changed by now but if that is still the case and you also happen to prefer Python, a simple Google search will help you find the solutions in Python.
1.2. Link to the Course
Free enrollment is available on Coursera.
2. John Paisley’s Machine Learning Course
John Paisley is an Assistant Professor in the Department of Electrical Engineering at Columbia University. He is also an affiliated member of the Data Science Institute at Columbia.
This one goes deeper than Andrew Ng’s course and explains more about the distinction between probabilistic and non-probabilistic modeling and also supervised versus unsupervised learning.
Pro Tip: This one is great for understanding the concepts but can sometimes go deep into the mathematical side of things. If you follow the math, that is great but if you do not (like me), focus more on the conceptual parts and you can safely skip the math-heavy portions.
Let’s look at the syllabus for details.
2.1. Syllabus
- Maximum likelihood estimation, linear regression, least squares
- Ridge regression, bias-variance, Bayes rule, maximum a posteriori inference
- Bayesian linear regression, sparsity, subset selection for linear regression
- Nearest neighbor classification, Bayes classifiers, linear classifiers, perceptron
- Logistic regression, Laplace approximation, kernel methods, Gaussian processes
- Maximum margin, support vector machines, trees, random forests, boosting
- Clustering, k-means, Expectation Maximization (EM) algorithm, missing data
- Mixtures of Gaussians, matrix factorization
- Non-negative matrix factorization, latent factor models, Principal Component Analysis (PCA) and variations
- Markov models, hidden Markov models
- Continuous state-space models, association analysis
- Model selection
2.2. Link to the Course
Course materials are available for free on edX.
3. Andreas Muller’s Applied Machine Learning
Andreas Muller is an Associate Research Scientist in Data Science Institute at Columbia.
His course materials are available online and he also goes deeper into some Natural Language Processing (NLP) topics (i.e. textual data) such as topic modeling, embeddings, etc. So if you are interested in learning more about NLP, you will enjoy those topics here.
Let’s look at the syllabus for details.
3.1. Syllabus
- Matplotlib and Visualization
- Supervised learning
- Preprocessing
- Linear models for regression
- Linear models for classification
- Trees, forests and ensembles
- Gradient descent and gradient boosting
- Model evaluation
- Calibration and imbalanced data
- Parameter tuning and automatic machine learning
- Dimensionality reduction
- Clustering and mixture models
- Working with text data
- Topic models for text data
- Word and document embeddings
- Neural networks
- Keras and convolutional neural networks
- Time series
3.2. Link to the Course
Pro Tip: Make sure to check out his YouTube videos, as well as the course materials, which are both available on the course website.