Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,269 public repositories matching this topic...
Sample code with spark dataframe manipulation and linear regression
-
Updated
Nov 21, 2022 - Jupyter Notebook
-
Updated
Jan 26, 2023 - Shell
Aplicação de regex para validação de nomes em spark
-
Updated
Nov 25, 2022 - Python
Trying best case apache spark working environment for robust data pipelines
-
Updated
Apr 1, 2023 - Python
All spark and Scala related projects will be stored there
-
Updated
Jan 22, 2023 - Scala
Custom integrations with external data sources using DataSource V2 API
-
Updated
Mar 28, 2023 - Scala
This notebook contains detailed code for spark and machine learning and databricks
-
Updated
Mar 15, 2023 - Jupyter Notebook
Pyspark and Spark [ My Notes and all practise Notebook ]
-
Updated
Jan 9, 2023 - Jupyter Notebook
spark with scala, including rdd, transform, action, hdfs, sparkSQL, dataframe and mllib
-
Updated
Feb 8, 2018 - XSLT
Spark assignments from "Introduction to Big Data" course (offered by IBM Skills Network)
-
Updated
Dec 4, 2022 - Jupyter Notebook
Learning to work with Apache Spark and Python by creating Study Cases and some small projects
-
Updated
Apr 20, 2023 - Jupyter Notebook
This repository contains all the codes I practiced with while learning the Spark technology
-
Updated
Jan 27, 2023 - Jupyter Notebook
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 414 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia