Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to connect to an existing Azure AI Search index #1098

Open
dionoid opened this issue May 13, 2024 · 3 comments
Open

[BUG] Unable to connect to an existing Azure AI Search index #1098

dionoid opened this issue May 13, 2024 · 3 comments
Assignees
Labels
bug Something isn't working P2 High priority

Comments

@dionoid
Copy link

dionoid commented May 13, 2024

Describe the bug
I'm using LangChain4j with an existing Azure AI Search index, which was created using the "Import and vectorize data" feature of Azure AI Search. When connecting this index using the AzureAiSearchContentRetriever, I found that the underlying AbstractAzureAiSearchEmbeddingStore doesn't allow me to override the default field names, metadata or index name, so I was blocked.
Also, metadata mapping in the AzureAiSearchContentRetriever seems to be limited to pure Vector queries and not implemented for FullText, Hybrid or HybridWithReranking.

Log and Stack trace
N.A.

To Reproduce
Import and vectorize documents into a new Azure AI Search index using the "Import and vectorize data", or use the "Add your data" feature in the playground of Azure OpenAI Studio. Then there is no way to connect these indexes to the AzureAiSearchContentRetriever and use them in LangChain4j.

Expected behavior

  • It would be nice if I can configure the Azure AI Search index field names in the AzureAiSearchContentRetriever.builder:
    • INDEX_NAME
    • DEFAULT_FIELD_ID
    • DEFAULT_FIELD_CONTENT
    • DEFAULT_FIELD_CONTENT_VECTOR
    • SEMANTIC_SEARCH_CONFIG_NAME
  • Same goes for metadata mapping.

Please complete the following information:

  • LangChain4j version: 0.30.0
  • Java version: 17

Additional context
I would be happy to contribute to this project. Reaching out to @jdubois to learn what he thinks the best way would be to solve this issue, and maybe we can work on this together? Also I have some additional questions:

  • The DEFAULT_FIELD_METADATA_SOURCE field with value "source" doesn't seem to be used anywhere? Would it be an option to move the key-value pairs (currently under metadata->attributes) to just metadata? Then "source" could be just a metadata key.
  • It seems the SearchIndex parameter of AbstractAzureAiSearchEmbeddingStore's initialize method is only used for creating a new index, but never for retrieving. Is this intended?
@dionoid dionoid added the bug Something isn't working label May 13, 2024
Copy link

dosubot bot commented May 13, 2024

To address the issue of customizing default field names, metadata, and index name in LangChain4j when using an existing Azure AI Search index, it seems like the current implementation doesn't provide direct support for this level of customization. Your suggestion to enhance the AzureAiSearchContentRetriever.builder to allow configuration of index field names and metadata mapping is valid and would indeed improve flexibility and usability.

For the DEFAULT_FIELD_METADATA_SOURCE field not being used, moving key-value pairs from metadata->attributes to just metadata and treating "source" as a metadata key could simplify the structure and make it more intuitive. This approach would require adjustments in the codebase to ensure compatibility and maintain the integrity of data retrieval and indexing processes.

Regarding the SearchIndex parameter in the AbstractAzureAiSearchEmbeddingStore's initialize method being used only for creating a new index and not for retrieving, this seems to be by design. However, revisiting this design could offer more flexibility in managing indexes, especially for use cases that involve connecting to and interacting with existing indexes.

Collaborating with the project maintainers, like reaching out to @jdubois as you mentioned, would be the best course of action to discuss these enhancements and potentially contribute to the project. Your willingness to contribute and your insights could lead to significant improvements in LangChain4j's Azure AI Search integration.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@jdubois
Copy link
Contributor

jdubois commented May 13, 2024

Thanks @dionoid ! This seems linked to #1062 so let me finish that one first.
I didn't know that Azure AI Search "Import and vectorize data" feature, this should be supported and documented!
There are lots of room for improvement here, so happy to work with you on this, or help you contribute!

@langchain4j langchain4j added the P2 High priority label May 13, 2024
@jdubois
Copy link
Contributor

jdubois commented May 13, 2024

@langchain4j can you assign this issue to me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 High priority
Projects
None yet
Development

No branches or pull requests

3 participants