Skip to content

Data Science portfolio of ipython Notebooks with several Machine Learning algorithms. They follow a structured data science methodology to face various challenges

License

Notifications You must be signed in to change notification settings

ibarrond/DataSciencePortfolio

Repository files navigation

Data Science Portfolio


Data Science portfolio of ipython notebooks implementing several Machine Learning algorithms following a structured, well-organized methodology to face each challenge:

[Data acquisition -> Data cleaning -> Data analysis -> Algorithm implementation -> Algorithm applied to dataset -> further optimization and advanced topics]

The notebooks cover a variety of topics and algorithms:

Algorithm Model Topic
Recommender Matrix Factorization - ALS LastFM music-user-artist data
Regression Random Forests Airplane Delay
Simulation MonteCarlo in TimeSeries Finantial Risk
Clustering KMeans Network Traffic and Anomaly Detection
Clustering KMeans in TimeSeries Timeseries of NeuroImages

The last couple of notebooks belong to a Challenge by SAFRAN, two three-hour sessions that were part of their recruitment process. They served as the ultimate test to everything learnt beforehand, since no work was allowed out of the sessions.

Details

  • Language: Python over Jupyter Notebooks.
  • Execution: set over a remote Spark cluster in EURECOM, managed by Zoe
  • Libraries: numpy, pandas, matplotlib, pyspark, thunder

Authors

  • Ole Andreas Hansen @oleaha
  • Alberto Ibarrondo Luis @ibarrond

Sources and acknowledgments

The rough sketches of all the notebooks are the main focus of the course Algorithmic Machine Learning in EURECOM, and in particular Pietro Michiardi

The majority of the Notebooks are based on use cases illustrated in the book Advanced Analytics with Spark, by Sandy Ryza, Uri Laserson, Sean Owen & Josh Wills.

The Notebooks are based on publicly available data.

License

MIT Free software

About

Data Science portfolio of ipython Notebooks with several Machine Learning algorithms. They follow a structured data science methodology to face various challenges

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published