You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As you're aware, Bedrock Claude 3 is designed to support multi-modal capabilities, including Vision mode. However, during testing of the latest version, it appears that the system does not currently support accessing the knowledge database when operating in Vision mode (see attached image).
Many use cases involve customers uploading images and seeking solutions, with the expectation that the system can retrieve relevant documents from the internal knowledge base and provide appropriate responses based on the visual input and accompanying query.
Implement a mechanism to generate semantic queries or searches based on the uploaded image and the user's question.
Develop a multi-modal response generation capability to provide solutions that integrate information from both the knowledge base and the visual input.
By addressing these points, we can enhance the functionality of Bedrock Claude 3 in Vision mode, enabling it to leverage the knowledge database effectively when processing visual inputs and queries from customers.
The text was updated successfully, but these errors were encountered:
Thank you for spotlighting the absence of knowledge database support in Bedrock Claude 3's Vision mode. I think many will want this!
The LangChain blog post suggests three approaches for implementing multi-modal RAG:
Using multi-modal embeddings (e.g., Amazon Titan Multimodal Embeddings model) to generate multimodal (image & text) embeddings and search those embeddings
Using a multi-modal LLM (Claude 3 Vision) to generate image text summaries, then searching those summaries with text/table content
Combining (2) with raw image retrieval, allowing a multi-modal LLM to incorporate images directly in responses
Which approach best fits your Claude's Vision mode use case? Did you want option 1, 2, 3, or another variant?
I'm handle the case for IT support system when users uploading image and asking how to fix the issue. The solution 3 will be the best choice at this time due to complexity and cost of multi-modal embedding.
But, for retrieval, I will generate semantic query using LLM from histories + user's question + image.
As you're aware, Bedrock Claude 3 is designed to support multi-modal capabilities, including Vision mode. However, during testing of the latest version, it appears that the system does not currently support accessing the knowledge database when operating in Vision mode (see attached image).
Many use cases involve customers uploading images and seeking solutions, with the expectation that the system can retrieve relevant documents from the internal knowledge base and provide appropriate responses based on the visual input and accompanying query.
Suggested Next Steps:
By addressing these points, we can enhance the functionality of Bedrock Claude 3 in Vision mode, enabling it to leverage the knowledge database effectively when processing visual inputs and queries from customers.
The text was updated successfully, but these errors were encountered: