GitHub - rayalex/spark-databricks-observability-demo: Monitoring Databricks using Prometheus, Grafana and Pyroscope

Databricks Spark Observability Demo

Monitoring and profiling Spark applications in Databricks with Prometheus, Grafana and Pyroscope

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

Dive deeply into performance details and uncover what Spark Execution Plan doesn't typically show.

(back to top)

Built With

(back to top)

Getting Started

This project demonstrates how to monitor and profile Spark applications in Databricks using Prometheus, Grafana and Pyroscope. This is applicable to any Spark application running on Databricks, including batch, streaming, and interactive workloads (including ephemeral Jobs).

Besides Prometheus, Pyroscope and Grafana, this project will create a small single-node Spark Cluster and a set of init scripts to configure it to push metrics to Prometheus Pushgateway and Pyroscope.

Prerequisites

This demo uses Terraform to create all necessary resources in your Databricks Workspace. You will need Terraform version 1.40 or later installed on your machine.

You'll also need a VM with the network connectivity to the Databricks Workspace. This VM should preferably be created in the same virtual network as the Databricks Workspace, or the peered network.

Databricks

You will need a Databricks account to run the demo if you don't have one already. You can sign up for a free account at https://databricks.com/try-databricks.

Tooling

In order to send metrics and traces to Prometheus and Pyroscope, they need to be set up and running. For the convenience of the demo, the complete setup is done using Docker Compose, which you can find in docker directory. The included Terraform configuration won't create these resources for you, so you will need to set them up.

It can be started with the following command:

docker compose up

Setup

You will need a Databricks Personal Access Token to run the demo. Once you have the token, you can create a profile in the Databricks CLI or configure the provider explicitly (using PAT or any other form of authentication).

(back to top)

Usage

Terraform setup has only two variables that need to be set, we can provide them through Environment (or through a file), making sure to replace the values with the actual ones:

export TF_VAR_prometheus_pushgateway_host={pushgateway_host}:9091
export TF_VAR_pyroscope_host={prometheus_host}:4040

Prometheus Demo

If configured, you'll be able to see all relevant metrics in Grafana. If you're using tagging, you are also able to filter by cluster, job, and other tags.

The example below shows the CPU usage of each executor in the Spark cluster.

Pyroscope Demo

If set correctly, here's what you should get at the end. The following example demonstrates profiling a Spark application that is bottlenecked by reading lzw compressed files, as well as using regex to process the data.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Project Link: https://github.com/rayalex/spark-databricks-observability-demo

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docker		docker
img		img
lib		lib
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
LICENSE		LICENSE
README.md		README.md
init-prometheus.sh		init-prometheus.sh
init-pyroscope.sh		init-pyroscope.sh
main.tf		main.tf
provider.tf		provider.tf
variables.tf		variables.tf
versions.tf		versions.tf

License

rayalex/spark-databricks-observability-demo

Folders and files

Latest commit

History

Repository files navigation

Databricks Spark Observability Demo

About The Project

Built With

Getting Started

Prerequisites

Databricks

Tooling

Setup

Usage

Prometheus Demo

Pyroscope Demo

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages