Skip to content
#

data-engineering-pipeline

Here are 125 public repositories matching this topic...

Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

  • Updated May 30, 2024
  • Python

Data Engineering πŸ› οΈ is like the backbone of data processing πŸ“Š, managing data pipelines πŸš€, warehouses 🏒, and lakes 🌊. It's the bridge πŸŒ‰ between raw data and actionable insights, powering businesses πŸš€ with efficient data management and analytics πŸ“ˆ.

  • Updated Mar 26, 2024
  • Jupyter Notebook

The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.

  • Updated Mar 21, 2024
  • Python

This project repo πŸ“Ί offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.

  • Updated Mar 20, 2024
  • Python

Improve this page

Add a description, image, and links to the data-engineering-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more