Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Valkey-Bloom: BloomFilter support for Valkey. #407

Open
KarthikSubbarao opened this issue Apr 30, 2024 · 11 comments
Open

[NEW] Valkey-Bloom: BloomFilter support for Valkey. #407

KarthikSubbarao opened this issue Apr 30, 2024 · 11 comments

Comments

@KarthikSubbarao
Copy link

The problem/use-case that the feature addresses

Bloom filters are a space efficient probabilistic data structure that can be used to “check” whether an element exists in a set (with a defined false positive), and to “add” elements to a set. While checking whether an item exists, false positives are possible, but false negatives are not possible. https://en.wikipedia.org/wiki/Bloom_filter

Description of the feature

Valkey-Bloom is a Rust Valkey-Module which brings a native and space efficient probabilistic Module data type to Valkey. With this, users can create filters (space-efficient probabilistic Module data type) to add elements, perform “check” operation to test whether an element exists, check cardinality / INFO, auto scale their filters, reserve filters, perform RDB Save and load operations, etc.

Valkey-Bloom is built using bloomfilter::Bloom (https://crates.io/crates/bloomfilter which has a BSD-2-Clause license).

It is compatible with the BloomFilter (BF.*) command APIs of redislabs/rebloom from Redis Ltd. which has over 10M image pulls on Docker and is compatible with several client libraries.

The following commands are supported.

BF.EXISTS
BF.ADD
BF.MEXISTS
BF.MADD
BF.CARD
BF.RESERVE
BF.INFO
BF.INSERT

We would like to bring Valkey-Bloom into the valkey-io project as an open source Valkey-Module that is free to use, contribute to, etc.

Alternatives you've considered

A bloom filter module does exist today for Redis - https://github.com/goodform/rebloom. However, it uses an AGPL-3.0 license which has additional obligations that are are difficult to meet for many of the active contributors who are looking to provide Valkey as a service. AGPL is also widely disallowed by company open source program offices (including Amazon). Given that this package has not been significantly modified since it was created six year ago, it seems likely that the license is part of the issue.

@natoscott
Copy link

natoscott commented Apr 30, 2024

@KarthikSubbarao we are continuing the goodform.io modules as native valkey modules too. Personally I don't think the lack of activity relates to the license - it's more that the code is essentially done and that all modules generally get little attention once mature - but we're just speculating here.

Can we find a way to co-exist? I have used naming like valkey-bloom (all lower case) and the module shared library valkeybloom.so for a simple transition for users (this module will be in Fedora soon with this naming convention as we transition away from Redis). This matches up with the other goodform.io modules like valkey-search, valkey-json, valkey-graph, and so on.

Would it be possible to name this new module in a way that highlights the differences perhaps? (e.g. Valkey-Bloom-Rust?)

@madolson
Copy link
Member

Can we find a way to co-exist?

Given your precedence, I think we shouldn't overwrite your naming. If you want to translate the names to valkey-*, I think we should respect that.

Would it be possible to name this new module in a way that highlights the differences perhaps? (e.g. Valkey-Bloom-Rust?)

We could call it Val-Bloom or something, more similar to how Redis was naming. Or we could name it based on the probability. Based on reading the docs (I've been historically advised not to read AGPL code while working in an AWS capacity), the rebloom only supports the Bloom data types and not any of the newer ones supported by Redis (like Top-K or Cookoo). I don't know how popular any of those are though.

@hpatro
Copy link
Contributor

hpatro commented Apr 30, 2024

Thanks @KarthikSubbarao for creating this.

This is one of the most popular modules and I've seen users used various alternatives like lua scripts, custom application around BITSET command when the prior modules weren't accessible (due to licensing). I believe it would be good if Valkey organization can make it part of the project.

Key questions :

  1. How do we bundle modules? Should it be part of the binary/containers/release(s) by default?
  2. Integration tests? Each module having their own testing framework might make it difficult for maintenance over the years. I would rather prefer continuing with TCL tests or introduce new lightweight framework and use it for each modules.

@natoscott
Copy link

@hpatro there is an existing python-based test framework (BSD licensed) from the early days that has been kept and used with all of the goodform modules. The earlier version is named 'rmtest' (Redis Modules Test) and I've been working on transitioning it to 'vkmtest' (ValKey Modules Test). Maybe it'll work for the Rust module testing too - you can find the initial version here: https://github.com/goodform/valkey-module-test

@madolson
Copy link
Member

@natoscott That is something I am very interested in taking over (specifically because I want a python based testing framework for the main project) if you have any interest in offloading the maintenance of it. Ideally it could be re-usable across all projects that run Valkey (or Redis even).

@madolson
Copy link
Member

How do we bundle modules? Should it be part of the binary/containers/release(s) by default?

This isn't the question we should answer here. Can you make a separate issue for it?

@natoscott
Copy link

@madolson happy to either work with you on it or have you take it over - I have alot on my plate (as I'm sure you do!) but I can definitely still dedicate some time to it. This test framework is also packaged in Fedora and I'd like to upload it to pypi for ease of use within the Valkey modules too.

@natoscott
Copy link

natoscott commented Apr 30, 2024

@KarthikSubbarao another possibility if you're super keen on ValkeyBloom and not something with 'Rust' in the name would be for me to use valkey-module-bloom for the existing modules. In hindsight I see I've used that prefix for -test and -sdk (python and C respectively) and that convention could be used on the C modules also perhaps? Anyway, let me know your thoughts, I'm happy to change it at this early stage. There was also mention of a new implementation of ValkeyJSON (not sure if its using Rust) from someone at Alibaba IIRC - so this naming issue may not be an isolated problem.

@madolson
Copy link
Member

madolson commented May 1, 2024

happy to either work with you on it or have you take it over

Cool! Not an immediate something to figure out, but would love to collaborate on this.

@hwware
Copy link
Member

hwware commented May 1, 2024

Thanks @KarthikSubbarao for creating this.

This is one of the most popular modules and I've seen users used various alternatives like lua scripts, custom application around BITSET command when the prior modules weren't accessible (due to licensing). I believe it would be good if Valkey organization can make it part of the project.

Key questions :

  1. How do we bundle modules? Should it be part of the binary/containers/release(s) by default?
  2. Integration tests? Each module having their own testing framework might make it difficult for maintenance over the years. I would rather prefer continuing with TCL tests or introduce new lightweight framework and use it for each modules.

Here we are #408

@daniel-house
Copy link
Member

Would it be possible to name this new module in a way that highlights the differences perhaps? (e.g. Valkey-Bloom-Rust?)

I like a name that highlights the differences in behavior but not one that gives the slightest hint about how it is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants