Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Valkey Modules Bundling #408

Open
hpatro opened this issue May 1, 2024 · 12 comments
Open

[NEW] Valkey Modules Bundling #408

hpatro opened this issue May 1, 2024 · 12 comments
Labels
major-decision-pending Needs decision by core team

Comments

@hpatro
Copy link
Contributor

hpatro commented May 1, 2024

Valkey supports dynamic loading of modules, which expands its capabilities by allowing users to add functionality beyond the core data structures at runtime. This feature enables users to enhance the core engine with custom modules developed independently. Bundling popular modules such as Bloom filters, JSON, Search, Timeseries, etc., along with the core engine enables adoption of Valkey for users seeking these features and also simplifies transitioning from Redis to Valkey.

Valkey can pursue one of the following options regarding bundling of Modules for each release version:

  1. Generate two binaries - Valkey Core (only the core) and Valkey Plus (core + modules): This option maintains a lightweight core binary and allows users to choose which modules to load. Additionally, a bundled binary includes popular and Valkey organization-supported modules.
  2. Generate a single binary with all Valkey-supported modules - This option involves creating a single binary of the core along with all Valkey organization-supported modules. This approach simplifies the release process by requiring only one binary.
  3. Generate Valkey core binary and each modules binary independently - This option allows different release cycle for the core and modules. The release process can be decoupled and can be performed independently maintainers of each project.

Personally, I prefer option 1 as it provides flexibility to users in choosing modules according to their needs as well as avoids the complexity of loading modules separately if they need any of the module based features.

Ref: #407

@PingXie
Copy link
Member

PingXie commented May 1, 2024

Does option 1 require a change to the module build process? We currently build each module into its own .so file.

Does option 2 affect the distribution only? or it implies that all modules distributed along with the core will be loaded automatically?

@natoscott
Copy link

I think there's an option #4 too which is an extension to #3: Valkey as it is now, in a container, with some blessed group of modules installed, and suitable valkey.conf settings to ensure those modules are automatically loaded at startup. That would be one definition of "bundling" anyway, that doesn't require any new code, makefile or other linkage changes to Valkey itself.

This option provides the benefits of all three of the listed options, I think...? And its what all the cloud vendors need (a container) - and can form the basis of future work on a Valkey Kubernetes operator.

@hwware
Copy link
Member

hwware commented May 1, 2024

Does option 1 require a change to the module build process? We currently build each module into its own .so file.

Does option 2 affect the distribution only? or it implies that all modules distributed along with the core will be loaded automatically?

IMO, I do not like option 2, because module is similar to a plugin. With more and more modules appear, the binary file could be very big. We should allow clients to choose which module(s) they want to use. There are thousand of plugins for eclipse and VS code

I prefer option 3. option 1 is a a little bit weird.

@madolson
Copy link
Member

madolson commented May 1, 2024

Definitely don't like 2. We would like to have a lean distribution of Valkey. I'm also leaning towards 3 over 1.

Does option 1 require a change to the module build process? We currently build each module into its own .so file.

It would yeah, we would probably have a separate build repository that builds everything together with special flags. I will say it make it much more difficult for end users to build, as you'll probably need to check out all the sub-modules and build them all individually. We've basically defined three distribution channels: direct downloads, containers, and through linux distros. I think this approach only really helps the first and third since you have everything built, but I have a suspicion that both 1 and 3 would be okay with downloading the modules separately and having to do a little assembly on their end.

Valkey as it is now, in a container, with some blessed group of modules installed, and suitable valkey.conf settings to ensure those modules are automatically loaded at startup.

I think this makes a lot of sense. This is the easiest way to "download" and try out the functionality.

@hpatro
Copy link
Contributor Author

hpatro commented May 1, 2024

Does option 1 require a change to the module build process? We currently build each module into its own .so file.
Does option 2 affect the distribution only? or it implies that all modules distributed along with the core will be loaded automatically?

IMO, I do not like option 2, because module is similar to a plugin. With more and more modules appear, the binary file could be very big. We should allow clients to choose which module(s) they want to use. There are thousand of plugins for eclipse and VS code

I prefer option 3. option 1 is a a little bit weird.

Option 3 is a subset of option 1, the only issue I see with it is some additional management operation needs to be performed by the admin to vet/load the module(s) on the node(s).
We can start off with option 3 and see if any user wants this bundling (option 1) and support it in the future.

@daniel-house
Copy link
Member

daniel-house commented May 1, 2024

How much risk is there that putting multiple modules written by various authors into a single binary will cause identifier collisions leading to undefined behavior?

hmmmm, this is making me think about the difference between the valkey binary and the shared libraries. Won't options 1 and 2 require code changes to allow non-loading of the modules that the user doesn't want? How would we even do that, given that the existing code searches the library for a function named XXXModule_OnLoad, and calls it for each module?

What am I missing?

@murphyjacob4
Copy link
Contributor

I have a suspicion that both 1 and 3 would be okay with downloading the modules separately and having to do a little assembly on their end.

For iterative features, I think this makes sense (Option 3).

But if we ever wanted to take functionality out of the core and put it into a module - it would probably break a lot of users who now would need to do loadmodule to get that functionality (e.g. I've heard some discussion of moving it to a module instead of a core engine feature). This is where I think something like Option 1 could be appealing - we can make decisions about spinning off features into modules without necessarily changing the functionality of the build artifact we ship to customers.

If we did go with Option 1, I think we should be selective about what gets automatically bundled, it should only be widely-used production-ready modules.

@stockholmux
Copy link
Contributor

I'd like to avoid bundling modules, so option 3 (or maybe even looser).

I don't like option 1. It picks winners. Say, I create a module that uppercases a string (silly example). It gets accepted in Valkey Plus. Then someone else comes along and creates a better uppercase module with a different API. Then the TSC has to make a gross decision: have two, incompatible uppercase modules (yuck), drop one for the other (breakage), or let the better module wallow in disuse because it's not in Valkey Plus (yuck).

I don't like option 2 and it troubles me about what Valkey even is. It has all the problems of option 1 AND why even have a code and modules then? To a user, they'll see commands and not differentiate between core commands and module commands, meaning the project will probably need to treat them the same way for versioning, maintenance, etc.

Frankly, I'd love to keep modules outside of Valkey. Maybe create a registry and an easy way to install modules from the registry.

Any way you cut it, the user should be in control and the bundling should minimize situation where the bundle picks winners.

@daniel-house
Copy link
Member

But if we ever wanted to take functionality out of the core and put it into a module - it would probably break a lot of users who now would need to do loadmodule to get that functionality (e.g. I've heard some discussion of moving it to a module instead of a core engine feature).

This seems like an important observation. There are many references to moving core features into modules. A very significant recent suggestion is to move the consensus algorithm into a module in cluster V2. Presumably this really means creating module-API functions that allow the default implementation of a feature to be overridden by means of the new module-API functions. I'd very much like to hear other peoples thoughts about what "move it into a module" means.

@hpatro
Copy link
Contributor Author

hpatro commented May 2, 2024

If we did go with Option 1, I think we should be selective about what gets automatically bundled, it should only be widely-used production-ready modules.

@murphyjacob4 that's the idea with option 1, I believe the TSC/maintainers are the best set of folks to take the decision on behalf of the users and reduce some of the administrator pain. And if in future if there is a better alternative (spec/performance/feature/memory usage) a drop in replacement can be performed without impacting the users.

@dmitrypol
Copy link
Contributor

dmitrypol commented May 2, 2024

I am also for option3. I like the idea of official modules ( and user developed modules). But I think it's better to have separate releases for Valkey core vs each module. Each module will have its own feature set, engineers and hence timelines.

eventually if a module becomes super popular with everyone using it then we could move that code into the Valkey code but then it's no longer a module but part of core.

@PingXie
Copy link
Member

PingXie commented May 2, 2024

I'd very much like to hear other peoples thoughts about what "move it into a module" means.

I think we are a bit off topic since the proposal here is about the "true" modules, for lack of a better word, such as JSON/Bloom filter/etc. I am partially guilty for this, given that I used "modules" a lot in the cluster v2 discussion.

@daniel-house, agreed the "modularization" idea that we were discussing in the cluster V2 thread need further clarification/deep-dive. At least from my end, I have been using "modules" in a very loose way in the context of the cluster V2 discussion. Let me expand my thoughts a bit more to help distinguish the two types of "modules" so we can focus back on the "true" modules on this thread and continue the "modularization" discussion separately.

First of all, I am fully aware of the operational convenience that we only have to deal with a single binary today. Not saying we should never break away from it but I think there is an extremely high bar that should be met before we start introducing a collection of binaries. On the other end, I don't like that there is no clear layering nor strict contracts between the logical components in the engine, such as clustering, persistence, and replication, etc. The recent refactoring of cluster.c is not helping either IMO. There needs to be a clean contract/mechanism that allows us to abstract away the low level implementation from the rest of the engine. This mechanism could be based on the existing module APIs or a further extension of it; or it could be something totally different. The keyword IMO is "abstraction" and this is what I had in mind when I wrote "modules" or "modularization" in the cluster V2 thread. And to be clear, I am not advocating we create separate binaries for cluster (v1 or v2). There should still be one valkey-server after the modularization work. I don't have details about what this new abstraction layer would look like right now. I will be trying some random ideas next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Needs decision by core team
Projects
None yet
Development

No branches or pull requests

9 participants