Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds an embedding store for Azure Cosmos DB for NoSQL #1115

Merged

Conversation

aayush3011
Copy link
Contributor

@aayush3011 aayush3011 commented May 15, 2024

Issue

This PR add supports for Azure Cosmos DB for NoSQL embedding store.

Change

  • This PR adds an embedding store for Azure Cosmos DB for NoSql. The test cases and IT test case is also included.

General checklist

  • There are no breaking changes
  • I have added unit and integration tests for my change
  • I have manually run all the unit and integration tests in the module I have added/changed, and they are all green
  • I have manually run all the unit and integration tests in the core and main modules, and they are all green

Checklist for adding new model integration

  • I have added my new module in the BOM

Checklist for adding new embedding store integration

  • I have added a {NameOfIntegration}EmbeddingStoreIT that extends from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
  • I have added my new module in the BOM

Checklist for changing existing embedding store integration

  • I have manually verified that the {NameOfIntegration}EmbeddingStore works correctly with the data persisted using the latest released version of LangChain4j

@aayush3011
Copy link
Contributor Author

@langchain4j, can you please review this PR as a priority. We need the changes to go in by this Friday. The azure-cosmos dependency will be released tomorrow morning, and I'll update the pom accordingly.

I am working adding the IT test.

@aayush3011
Copy link
Contributor Author

@langchain4j any update.

Copy link
Owner

@langchain4j langchain4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi.
A few things:

  1. please fill out the PR template, it exists for a reason.
  2. please provide a reason why this PR should be prioritized over 70+ others?

@aayush3011 aayush3011 changed the title Adds a vector store for Azure Cosmos DB for NoSQL Adds an embedding store for Azure Cosmos DB for NoSQL May 17, 2024
@aayush3011
Copy link
Contributor Author

aayush3011 commented May 17, 2024

@langchain4j I have updated the PR template. This PR is required for a demo and lab at Microsoft Build Conference happening next week.

The Azure CosmosDB Java SDK will be released this afternoon. I just wanted to get eyes on the PR, and make it ready so that as soon as the Java SDK is released publicly, we can get the PR in.

For now, I have tested this PR in local.

@aayush3011
Copy link
Contributor Author

@langchain4j any update on this.

.cosmosClient(client)
.databaseName(DATABASE_NAME)
.containerName(CONTAINER_NAME)
.cosmosVectorEmbeddingPolicy(populateVectorEmbeddingPolicy(dimensions))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please double check which properties are needed only for creating database/container if it does not exists and which are needed for querying. I suspect some of these properties are not required once db/container are created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@langchain4j All these properties are required to create a container/database which can perform vector search. Only the path of the embedding path is required for querying.

I'm not sure what else do you want to me check here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you implement it so that user does not have to provide all properties (see all the assertions for null/empty in the constructor) if the database/container are already in place?

@langchain4j
Copy link
Owner

@aayush3011 should I merge and release it as-is now or do you plan to address the comments?

Copy link
Owner

@langchain4j langchain4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aayush3011 I will merge it now to not block the release, feel free to address comments in the separate PR. Thank you!

@langchain4j langchain4j merged commit 9e382d4 into langchain4j:main May 23, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants