[BUG] Unable to connect to an existing Azure AI Search index #1098

dionoid · 2024-05-13T12:53:11Z

Describe the bug
I'm using LangChain4j with an existing Azure AI Search index, which was created using the "Import and vectorize data" feature of Azure AI Search. When connecting this index using the AzureAiSearchContentRetriever, I found that the underlying AbstractAzureAiSearchEmbeddingStore doesn't allow me to override the default field names, metadata or index name, so I was blocked.
Also, metadata mapping in the AzureAiSearchContentRetriever seems to be limited to pure Vector queries and not implemented for FullText, Hybrid or HybridWithReranking.

Log and Stack trace
N.A.

To Reproduce
Import and vectorize documents into a new Azure AI Search index using the "Import and vectorize data", or use the "Add your data" feature in the playground of Azure OpenAI Studio. Then there is no way to connect these indexes to the AzureAiSearchContentRetriever and use them in LangChain4j.

Expected behavior

It would be nice if I can configure the Azure AI Search index field names in the AzureAiSearchContentRetriever.builder:
- INDEX_NAME
- DEFAULT_FIELD_ID
- DEFAULT_FIELD_CONTENT
- DEFAULT_FIELD_CONTENT_VECTOR
- SEMANTIC_SEARCH_CONFIG_NAME
Same goes for metadata mapping.

Please complete the following information:

LangChain4j version: 0.30.0
Java version: 17

Additional context
I would be happy to contribute to this project. Reaching out to @jdubois to learn what he thinks the best way would be to solve this issue, and maybe we can work on this together? Also I have some additional questions:

The DEFAULT_FIELD_METADATA_SOURCE field with value "source" doesn't seem to be used anywhere? Would it be an option to move the key-value pairs (currently under metadata->attributes) to just metadata? Then "source" could be just a metadata key.
It seems the SearchIndex parameter of AbstractAzureAiSearchEmbeddingStore's initialize method is only used for creating a new index, but never for retrieving. Is this intended?

dosubot · 2024-05-13T12:53:51Z

To address the issue of customizing default field names, metadata, and index name in LangChain4j when using an existing Azure AI Search index, it seems like the current implementation doesn't provide direct support for this level of customization. Your suggestion to enhance the AzureAiSearchContentRetriever.builder to allow configuration of index field names and metadata mapping is valid and would indeed improve flexibility and usability.

For the DEFAULT_FIELD_METADATA_SOURCE field not being used, moving key-value pairs from metadata->attributes to just metadata and treating "source" as a metadata key could simplify the structure and make it more intuitive. This approach would require adjustments in the codebase to ensure compatibility and maintain the integrity of data retrieval and indexing processes.

Regarding the SearchIndex parameter in the AbstractAzureAiSearchEmbeddingStore's initialize method being used only for creating a new index and not for retrieving, this seems to be by design. However, revisiting this design could offer more flexibility in managing indexes, especially for use cases that involve connecting to and interacting with existing indexes.

Collaborating with the project maintainers, like reaching out to @jdubois as you mentioned, would be the best course of action to discuss these enhancements and potentially contribute to the project. Your willingness to contribute and your insights could lead to significant improvements in LangChain4j's Azure AI Search integration.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

jdubois · 2024-05-13T14:19:36Z

Thanks @dionoid ! This seems linked to #1062 so let me finish that one first.
I didn't know that Azure AI Search "Import and vectorize data" feature, this should be supported and documented!
There are lots of room for improvement here, so happy to work with you on this, or help you contribute!

jdubois · 2024-05-13T15:00:37Z

@langchain4j can you assign this issue to me?

dionoid added the bug Something isn't working label May 13, 2024

langchain4j added the P2 High priority label May 13, 2024

langchain4j assigned jdubois May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unable to connect to an existing Azure AI Search index #1098

[BUG] Unable to connect to an existing Azure AI Search index #1098

dionoid commented May 13, 2024 •

edited

dosubot bot commented May 13, 2024 •

edited

Details

jdubois commented May 13, 2024

jdubois commented May 13, 2024

[BUG] Unable to connect to an existing Azure AI Search index #1098

[BUG] Unable to connect to an existing Azure AI Search index #1098

Comments

dionoid commented May 13, 2024 • edited

dosubot bot commented May 13, 2024 • edited

Details

jdubois commented May 13, 2024

jdubois commented May 13, 2024

dionoid commented May 13, 2024 •

edited

dosubot bot commented May 13, 2024 •

edited