Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding partition_names parameter in search_index method of EmbeddingHandler, only for Milvus database #262

Open
virunew opened this issue Jan 4, 2024 · 2 comments

Comments

@virunew
Copy link
Contributor

virunew commented Jan 4, 2024

Milvus database provides capability to partition data inside a collection ( I intend to use this capability) . Currently EmbeddingHandler code does not provide this optional parameter. I have updated the code to provide this capability. Raising a PR for the same. Please review and if this goes well with the roadmap you have then you may want to add this feature.

@doberst
Copy link
Contributor

doberst commented Jan 7, 2024

Thanks for this feedback - we have been researching the issue as well. Would you like the capability to use a partition at search only, or to have the ability to create new partitions at the time of indexing a library? Also, are you looking to search 1 partition at a time , or potentially 'dynamically' selecting multiple partitions at a time?

We are in the process of enhancing the configuration objects and options around all of the data stores (including support for some new ones). One of our design principles is to try to keep the main classes 100% consistent across all endpoints - so no code changes in the main workflow required when changing LLM, embedding model, vector DB, document sources, etc. (e.g., preserve a strong abstraction layer) - and then trying to contain the endpoint-specific parameters/configs in the implementation of the resource object as much as possible ... So far, we have been using primarily os.environ variables or global LLMWareConfigs to configure endpoint resources - would welcome your input on this ....

@virunew
Copy link
Contributor Author

virunew commented Jan 8, 2024

hi Darren,
Thanks for youre response. I would need the capabiltity to both read and write from particular partitions and also search across partitions for my use case as this gives me a capability to limit the scope of search and potentially make search much quicker. I have extended llmware classes to have the read/write capability based on partitions .I appreciate that you would want to have consistency across APIs so that user can exchange any vector DB ( and other options) without changing code ( this is why I wondered if my changes fit in your roadmap and larger plan) .

About your other point about configs , in my view, keeping config in environments is works well but sometimes, specially in a larger project it may hinder encapsulation , e.g. in my earlier post I mentioned that the test suite is currently clearing up the 'default' vector database. In this scheme of things, keeping Milvus database in environment variable is a bit risky as I might run test suite while my enviornment setting has my custom databse set, if this happens, then I will lose all my data. But keeping these values well encapsulated will avoid this issue. I feel that having configs in enviornment variables is akin to using global variables, with similar pitfalls and perhaps more because those environment variables may be set in different levels ( system wide, within IDE, within profile). Hope it make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants