data-engineering-pipeline

Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

python aws spark aws-lambda etl aws-s3 pandas pyspark data-engineering aws-iam aws-cloudwatch data-pipeline etl-pipeline aws-glue data-engineering-workflows data-engineering-pipeline aws-lambda-layers aws-data-engineering-project data-engineering-project

Updated May 30, 2024
Python

julian506 / openweathermap-etl

Star

A simple ETL for temperature data from the Openweathermap API, storing it into an Azure SQL Database

python etl azure scheduler data-engineering azure-sql-database etl-pipeline data-engineering-pipeline

Updated May 29, 2024
Python

AyushRaiKhare / Ayush_Khare_Data_Engineering_Portfolio

Star

Ayush @ Data Engineering Portfolio

jenkins data-science data data-visualization data-engineering dataflow dbt kubernetes-deployment data-engineer etl-pipeline data-engineering-pipeline mlops data-engineering-nanodegree

Updated May 27, 2024

jolly-io / Data_Engineering_Notes

Star

data-science data-engineering data-engineering-pipeline data-lifecycle-management data-maturity-model

Updated May 21, 2024

Cognizant-Technology-Innovation / lakehouseops-sra-for-databricks

Star

The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.

data-engineering databricks data-engineering-pipeline

Updated Jun 4, 2024
HCL

AtharvTarte / Bing-News-Analysis

Star

In this project, I have created an end to end solution for analyzing the bing latest news data. I have used the microsoft fabric for all the tools.

spark fabric azure powerbi data-engineering-pipeline microsoft-fabric

Updated Apr 23, 2024
Jupyter Notebook

alfredzou / BoardGameGeek_Pipeline

Star

Pipeline to automate the collection of board game and expansion data from BoardGameGeek's XML API2. Data is stored in Google Cloud Storage and BigQuery. Data is modelled using DBT in a star schema. (Terraform, GCP, Mage, Python, dbt)

board-game terraform gcp data-engineering boardgame mage dbt boardgamegeek board-games data-engineering-pipeline

Updated Apr 23, 2024
Python

JessicaHora / JessicaHora

Star

python aws data-science data sql pandas data-visualization data-science-portfolio dataengineering etl-pipeline data-engineering-pipeline

Updated Apr 12, 2024
CSS

data2al / dbt-tutorial-course

Star

sql data-engineering-pipeline dbt-core

Updated Apr 2, 2024

prayagnshah / End-to-End-Pipeline

Star

Zillow Data Pipeline: Extracts data from Zillow, transfers it through AWS services, and performs analytics. Utilizes Python scripts, AWS Lambda, S3, Amazon RedShift, and QuickSight. Explore docs/images for architecture visuals.

python aws-lambda aws-s3 aws-ec2 redshift dag zillow-api quicksight data-engineering-pipeline

Updated Mar 27, 2024
Python

yashksaini-coder / Python-for-Data-Engineering

Star

Data Engineering 🛠️ is like the backbone of data processing 📊, managing data pipelines 🚀, warehouses 🏢, and lakes 🌊. It's the bridge 🌉 between raw data and actionable insights, powering businesses 🚀 with efficient data management and analytics 📈.

python aws data-science kafka data-engineering data-engineer data-engineering-pipeline

Updated Mar 26, 2024
Jupyter Notebook

kkrusere / NHANES-pyTOOL-API

Star

The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.

data-mining data-processing health-data nhanes health-data-analysis data-engineering-pipeline health-data-science

Updated Mar 21, 2024
Python

shiv-rna / Youtube-Data-Engineering-Pipeline

Star

This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.

aws youtube aws-lambda aws-s3 aws-cli data-engineering aws-iam aws-athena aws-glue data-engineering-pipeline aws-quicksight aws-glue-data-catalog

Updated Mar 20, 2024
Python

Improve this page

Add a description, image, and links to the data-engineering-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-engineering-pipeline

Here are 125 public repositories matching this topic...

Semiu / data-engineering

longNguyen010203 / Youtube-ETL-Pipeline

anki-code / xontrib-pipeliner

vmware / versatile-data-kit

PATRICIAJUNQUEIRA / Airflow_Pipeline_Gera_Pasta

umairkarel / Amazon-Sales-Data-Engineering

piyushsingariya / Shift

waqarg2001 / Youtube-Data-Pipeline-AWS

julian506 / openweathermap-etl

AyushRaiKhare / Ayush_Khare_Data_Engineering_Portfolio

jolly-io / Data_Engineering_Notes

Cognizant-Technology-Innovation / lakehouseops-sra-for-databricks

AtharvTarte / Bing-News-Analysis

alfredzou / BoardGameGeek_Pipeline

JessicaHora / JessicaHora

data2al / dbt-tutorial-course

prayagnshah / End-to-End-Pipeline

yashksaini-coder / Python-for-Data-Engineering

kkrusere / NHANES-pyTOOL-API

shiv-rna / Youtube-Data-Engineering-Pipeline

Improve this page

Add this topic to your repo