Skip to content

Repository containing the projects developed during the IBM Data Science Professional Certificate Specialization. Six in total.

License

Notifications You must be signed in to change notification settings

marcoshsq/IBMDataScience

Repository files navigation

Smiley face Smiley face

IBM: Data Science Certificate Projects


IBM INSTRUCTORS

Instructors: Rav Ahuja, Alex Aklson, Aije Egwaikhide, Svetlana Levitan, Romeo Kienzler, Polong Lin, Joseph Santarcangelo, Azim Hirjani, Hima Vasudevan, Saishruthi Swaminathan, Saeed Aghabozorgi, Yan Luo

This repository contains the projects developed during the IBM Data Science Professional Certificate.

About the Specialization:

There are 10 Courses in this Professional Certificate:

  1. What is Data Science?
  2. Tools for Data Science
  3. Data Science Methodology
  4. Python for Data Science, AI & Development
  5. Python Project for Data Science
  6. Databases and SQL for Data Science with Python
  7. Data Analysis with Python
  8. Data Visualization with Python
  9. Machine Learning with Python
  10. Applied Data Science Capstone

Projects:

Project developed during module 05/10 of the IBM Data Science Professional Certificate Specialization. During the course, subjects such as web scraping and libraries were reviewed, in addition to laboratories and activities, we ended with this project. The objective of the project was to collect data to later develop a dashboard.

For the development of the project the following libraries were used: pandas, requests, bs4, html5lib, lxml, plotly, bs4, BeautifulSoup, yfinance.

Project developed during module 06/10 of the IBM Data Science Professional Certificate Specialization. During the course, subjects such as Cloud Databases, Python Programming, Ipython, Relational Database Management System, SQL statements and etc., in addition to laboratories and activities, we ended with this project. The objective of the project was to create a table using IBM Db2 SQL, after filling the table with data from three CSV files about the city of Chicago, we performed an analysis using Python in a Jupyter Notebook.

Project developed during module 07/10 of the course, the project scenario is: "You are a Data Analyst working at a Real Estate Investment Trust. The Trust will like to start investing in Residential real estate. You are tasked with determining the market price of a house given a set of features. You will analyze and predict housing prices using attributes or features such as square footage, number of bedrooms, number of floors, and so on."

For the development of the project the following libraries were used: pandas, matplotlib, numpy, seaborn and scikit-learn.

Project developed during module 08/10 of the course, the goal was to build a dashboard using an internal tool provided by IBM, unfortunately, I couldn't use the tool for technical reasons, so instead, I used the Google Colaboratory notebook. However, this implies that my code will be slightly different from the one expected in the labs of the course, I've used a different library to create e.g. jupyter_dash. But the result was fine, and I pretty much enjoyed it!

Project developed during module 09/10 of the course, we used a dataset about past loans. The Loan_train.csv data set includes details of 346 customers whose loan are already paid off or defaulted. the goal was to practice all the classification algorithms thaught in the course.

This is it boys, the final and big one, the Capstone Project, the last course in the specialization, the scenario is: A company called SpaceY wants to compete with SpaceX, because of yes!

Now we (the Data Scientist need to develop a analysis to predict the success of the operation.For this project we needed to:

Collect data from public SpaceX API and SpaceX Wikipedia page. Explore data using SQL, visualization, Folium maps, and dashboards. Gather relevant columns to be used as features. Change all categorical variables to binary using one-hot encoding. Standardize data and use Grid Search CV to find the best parameters for machine learning models. And visualizing the accuracy score of all models.

Four machine learning models were produced during the project: Logistic Regression, Support Vector Machine, Decision Tree Classifier, and K Nearest Neighbors. All produced similar results, with an accuracy rate of about 83.33%. All models overpredicted successful landings. Anyways, was a fun project to do, but most important, it's was my first step in this data journey, and as the saying goes: Greatness in small beginnings!

Thanks to the instructors, and a huge shout-out to the people on Coursera.