Skip to content

Data engineering & analysis portfolio, which showcases my use of Python & SQL

Notifications You must be signed in to change notification settings

Saltiola7/Data-Analysis-Portfolio

Repository files navigation

Data Engineering & Analysis Portfolio

Welcome to my portfolio, which showcases a python & JavaScript web scraping ETL pipeline, Jupyter Notebooks analyzing many different datasets as well as data visualizations using Tableau.

Table of Contents

  • Data Engineer certification from Datacamp
  • Recommendation Letter from Data Analyst Mentor
  • Testimonial from Web Scraping Client
  • Web Scraping ETL Pipeline
  • Jupyter Notebooks
  • Utility Scripts With Python for Google Sheets, Airtable, Shopify, ChatGPT
  • Tableau Visualizations

Data Engineer certification from Datacamp

Recommendation Letter from Data Analyst Lecturer

Testimonial from Web Scraping Client

Vesa Karjalainen, Polq Oy: I had the opportunity to work with Tommi on developing a critical scraping tool and server for our company. His technical expertise, innovative approach, and dedication to understanding our specific needs resulted in a seamless and efficient solution. The tool has significantly improved our data collection processes, demonstrating Tommi's ability to deliver high-quality work under tight deadlines. His professionalism and willingness to go the extra mile made a remarkable difference. I highly recommend Tommi to anyone looking for exceptional technical solutions in data management and infrastructure.

Scraping job board data from multiple websites to custom job board application.

It was first build with the community Docker Compose setup, but was moved to Prefect.io before launch as it was a more streamlined solution for the client.

I use various python data science packages, e.g.: numPy, matplotlib, pandas, seaborn, scipy.

  • Data cleaning & fixing structural errors
  • Check for outliers
  • Descriptive Statistic
  • Correlations
  • Normality tests

I answer questions like

  • Why does higher % of gender 1 have malignant tumours?
  • What other features may be linked to malignant tumours?
  • What is Wallmarts most sold product?
  • What are the most documented use cases for cannabis, where?

Why does higher % of gender 1 have malignant tumours?

Gender & Cancer Level Crosstab Gender & Alcohol use Gender & Air pollution Gender & Genetic Risk

What other features may be linked to malignant tumours?

Cancer Level & Obesity Age bins & Cancer Level

Python for Spreadsheets and Databases

  • Scrapes all pages of a website into a csv which can be imported to ChatGPT for analysis. We also give lates guidelines together with the CSV and prompt ChatGPT to point out any content that is against the guidelines. Saves time for creating compliant CBD content.
  • Querying the most popular products so we can display them in headless ecommerce with live data accordingly in the popular products section
  • Splitting data in one column into multiple columns with
  • Built my own markdown to html extension so that we can write markdown into airtable and sync it as html to Webflow CMS
  • Script for checking the pagespeeds for URLs in column. Useful for lead generation. Also other smaller data cleaning scripts

Tableau