Introduction to Data Engineering
-
Updated
Jun 9, 2024 - Jupyter Notebook
Introduction to Data Engineering
πππ A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker πΊ
Let your pipe lines flow thru the Python code in xonsh.
One framework to develop, deploy and operate data workflows with Python and SQL.
Pipeline de dados automatizado para extração e armazenamento de previsáes meteorológicas para o setor de turismo.
Data Engineering Pipeline practice with Amazon Sales Data
Shift is a high performance better alternative to Airbyte, Singer, Meltano
Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
A simple ETL for temperature data from the Openweathermap API, storing it into an Azure SQL Database
Ayush @ Data Engineering Portfolio
The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.
In this project, I have created an end to end solution for analyzing the bing latest news data. I have used the microsoft fabric for all the tools.
Pipeline to automate the collection of board game and expansion data from BoardGameGeek's XML API2. Data is stored in Google Cloud Storage and BigQuery. Data is modelled using DBT in a star schema. (Terraform, GCP, Mage, Python, dbt)
Zillow Data Pipeline: Extracts data from Zillow, transfers it through AWS services, and performs analytics. Utilizes Python scripts, AWS Lambda, S3, Amazon RedShift, and QuickSight. Explore docs/images for architecture visuals.
Data Engineering π οΈ is like the backbone of data processing π, managing data pipelines π, warehouses π’, and lakes π. It's the bridge π between raw data and actionable insights, powering businesses π with efficient data management and analytics π.
The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.
This project repo πΊ offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.
Add a description, image, and links to the data-engineering-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering-pipeline topic, visit your repo's landing page and select "manage topics."