License GitHub forks GitHub stars

Python Machine Learning Notebooks (Tutorial style)

Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, CA. Please feel free to add me on LinkedIn here.*92h6Lg1Bu1F9QqoVNrkLdQ.jpeg


  • Python 3.5
  • NumPy (pip install numpy)
  • Pandas (pip install pandas)
  • Scikit-learn (pip install scikit-learn)
  • SciPy (pip install scipy)
  • Statsmodels (pip install statsmodels)
  • MatplotLib (pip install matplotlib)
  • Seaborn (pip install seaborn)
  • Sympy (pip install sympy)

You can start with this article that I wrote in Heartbeat magazine (on Medium platform):

“Some Essential Hacks and Tricks for Machine Learning with Python”

Essential tutorial-type notebooks on Pandas, Numpy, and visualizations

Jupyter notebooks covering a wide range of functions and operations on the topics of NumPy, Pandans, Seaborn, matplotlib etc.

Complexity and Learning curve analysis

Complexity and learning curve analyses are essentially are part of the visual analytics that a data scientist must perform using the available dataset for comparing the merits of various ML algorithms.

Learning curve: Graphs that compares the performance of a model on training and testing data over a varying number of training instances. We should generally see performance improve as the number of training points increases.

Complexity curve: Graphs that show the model performance over training and validation set for varying degree of model complexity (e.g. degree of polynomial for linear regression, number of layers or neurons for neural networks, number of estimator trees for a Boosting algorithm or Random Forest).

  • Complexity and learning curve with Lending club dataset (Here is the Notebook).
  • Complexity and learning curve with a synthetic dataset using the Hastie function from Scikit-learn (Here is the Notebook).

Random data generation using symbolic expressions

Simple deployment examples (serving ML models on web API)

Object-oriented programming with machine learning

Implementing some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.

Here is the complete Python script with the linear regression class, which can do fitting, prediction, cpmputation of regression metrics, plot outliers, plot diagnostics (linearity, constant variance, etc.), compute variance inflation factors.

I created a Python package based on this work, which offers simple Scikit-learn style interface API along with deep statistical inference and residual analysis capabilities for linear regression problems. Check it out here.

See my articles on Medium on this topic.