Adding partition_names parameter in search_index method of EmbeddingHandler, only for Milvus database #262

virunew · 2024-01-04T07:27:36Z

Milvus database provides capability to partition data inside a collection ( I intend to use this capability) . Currently EmbeddingHandler code does not provide this optional parameter. I have updated the code to provide this capability. Raising a PR for the same. Please review and if this goes well with the roadmap you have then you may want to add this feature.

doberst · 2024-01-07T15:11:40Z

Thanks for this feedback - we have been researching the issue as well. Would you like the capability to use a partition at search only, or to have the ability to create new partitions at the time of indexing a library? Also, are you looking to search 1 partition at a time , or potentially 'dynamically' selecting multiple partitions at a time?

We are in the process of enhancing the configuration objects and options around all of the data stores (including support for some new ones). One of our design principles is to try to keep the main classes 100% consistent across all endpoints - so no code changes in the main workflow required when changing LLM, embedding model, vector DB, document sources, etc. (e.g., preserve a strong abstraction layer) - and then trying to contain the endpoint-specific parameters/configs in the implementation of the resource object as much as possible ... So far, we have been using primarily os.environ variables or global LLMWareConfigs to configure endpoint resources - would welcome your input on this ....

virunew · 2024-01-08T11:53:09Z

hi Darren,
Thanks for youre response. I would need the capabiltity to both read and write from particular partitions and also search across partitions for my use case as this gives me a capability to limit the scope of search and potentially make search much quicker. I have extended llmware classes to have the read/write capability based on partitions .I appreciate that you would want to have consistency across APIs so that user can exchange any vector DB ( and other options) without changing code ( this is why I wondered if my changes fit in your roadmap and larger plan) .

About your other point about configs , in my view, keeping config in environments is works well but sometimes, specially in a larger project it may hinder encapsulation , e.g. in my earlier post I mentioned that the test suite is currently clearing up the 'default' vector database. In this scheme of things, keeping Milvus database in environment variable is a bit risky as I might run test suite while my enviornment setting has my custom databse set, if this happens, then I will lose all my data. But keeping these values well encapsulated will avoid this issue. I feel that having configs in enviornment variables is akin to using global variables, with similar pitfalls and perhaps more because those environment variables may be set in different levels ( system wide, within IDE, within profile). Hope it make sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding partition_names parameter in search_index method of EmbeddingHandler, only for Milvus database #262

Adding partition_names parameter in search_index method of EmbeddingHandler, only for Milvus database #262

virunew commented Jan 4, 2024

doberst commented Jan 7, 2024

virunew commented Jan 8, 2024

Adding partition_names parameter in search_index method of EmbeddingHandler, only for Milvus database #262

Adding partition_names parameter in search_index method of EmbeddingHandler, only for Milvus database #262

Comments

virunew commented Jan 4, 2024

doberst commented Jan 7, 2024

virunew commented Jan 8, 2024