Skip to content

Engineered block tree construction based on the Longest Previous Factor array that also works in parallel.

License

Notifications You must be signed in to change notification settings

pasta-toolbox/block_tree

Repository files navigation

pasta::block_tree

License: GPL v3 DOI pasta::block_tree CI badge

This header-only library contains two highly configurable block tree implementations and is largely based on Daniel Meyer's code developed during his Master's thesis.

If you use this code in a scientific context, please cite our paper.

@inproceedings{KoepplKM2023LPFBlockTree,
  author    = {Dominik Köppl and
               Florian Kurpicz and
               Daniel Meyer},
  title     = {Faster Block Tree Construction},
  booktitle = {Accepted at {ESA}},
  year      = {2023}
}

Content

This repository contains a generic interface for block trees and two construction algorithms.

  1. A construction algorithm based on Karp-Rabin fingerprints, which is basically a re-implementation of the original block tree implementation described by Belazzougui et al. J. Comput. Syst. Sci. ’21.
  2. A construction algorithm based on the longest previous factor array, which can be up to an order of magnitude faster than the first algorithm, see our benchmarks below.

Easy to Use

This library has only a single dependency in the form of the Succinct Data Structure Library (SDSL).1 Please make sure you have this library installed, such that CMake can find it. You can see in our example below, that the block tree is easy to compute and easy to use.

// Generate some random text
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<uint8_t> dist(0, 15);

size_t const string_length = 100000;
std::vector<uint8_t> text; 
text.resize(string_length);
for (size_t i = 0; i < text.size(); ++i) {
  text[i] = dist(gen);
}

// Build the block tree for the random text
auto* bt = pasta::make_block_tree_lpf<uint8_t, int32_t>(text, 2, 1, true);

// Use the block tree to access individual characters of the text
std::cout << "# Acces" << "\n";
for (size_t i = 0; i < 100; ++i) {
  std::cout << bt->access(i) << ", ";
}
std::cout << "\n";

// Add additional rank and select support
bt->add_rank_support();

// Get the rank of the first character for the first 10 characters
std::cout << "# Rank" << "\n";
for (size_t i = 0; i < 15; ++i) {
  std::cout << bt->rank(0, i) << ", ";
}
std::cout << "\n";

// Get the positions of the first 5 in the text
std::cout << "# Select" << "\n";
std::cout << bt->select(5, 1) << "\n";

// Clean-up
delete bt;

Note that this software is currently in early development and the interface might change during the following releases.

Benchmarks and Tests

There exists an easy to use benchmark repository, which helps to compare the implementations contained in this repository with other block tree implementations. The script reproduces the experiments conducted in our paper (including plotting the data). Below, you can find some of the figures we present in the paper.

Sequential construction time plot

Parallel construction time plot

How to Get This

Below, we list all commands that are required to build the code in this repository. To this end, we provide three CMake presets (debug, release, and release with debug information).

  • The debug preset creates a debug folder and uses the compiler flags -DDEBUG -O0 -g -ggdb -fsanitize=address.
  • The release preset creates a build folder and uses the compiler flags -DNDEBUG -march=native -O3.
  • The release with debug information preset creates a build_with_debug_info folder and uses the compiler flags -DDEBUG -g -march=native -O3.

Per default, we use the following compiler flags: -Wall -Wextra -Wpedantic -fdiagnostics-color=always.

Requirements

pasta::block_tree is written in C++20 and requires a compiler that supports it. We use Ninja as build system. Additionally, we require the Succinct Data Structure Library (SDSL) to be installed on the system.1

tl;dr

To just clone the source code, use the following.

git clone git@github.com:pasta-toolbox/block_tree
cd block_tree
git submodule update --init --recursive

If you also want to build the test, please continue with the following commands.

cmake --preset=build -DPASTA_BLOCK_TREE_BUILD_TESTS=On -DPASTA_BLOCK_TREE_BUILD_EXAMPLES=On
cmake --build --preset=build
ctest --test-dir build

Footnotes

  1. We are working hard on removing this dependency. 2