GitHub - aws-samples/semantic-search-with-amazon-opensearch

Improve search relevance with machine learning in Amazon OpenSearch Service

This repository guides users through creating a semantic search using Amazon SageMaker and Amazon OpenSearch services

How does it work?

This code repository is for Semantic and Vector Search with Amazon OpenSearch Service Workshop. For more information about semantic search, please refer the workshop content.

Semantic Search Architecture

Rereieval Augmented Generation Architecture

Converational Search Architecture

CloudFormation Deployment

The workshop can only be deployed in us-east-1 region
Use the Cloudformation template cfn/semantic-search.yaml to create CF stack
Cloudformation stack name must be semantic-search as we use this stack name in our lab
You can click the following link to deploy CloudFormation Stack

Region	Launch Template
US East (N. Virginia)

Lab Instruction

There are 8 modules in this workshop:

Module 1 - Search basics: You will learn fundamentals of text search and semantic search. This section also introduces differences between a best matching algorithm, popularly known as BM25 similarity and semantic similarity.
Module 2 -Text search: You will learn text search with Amazon OpenSearch Service. In information retrieval this type of searching is traditionally called 'Keyword' search.
Module 3 - Semantic search: You will learn semantic search with Amazon OpenSearch Service and Amazon SageMaker. You will use a machine learning technique called Bidirectional Encoder Representations from transformers, popularly known as BERT. BERT uses a pre-trained natural language processing (NLP) model that represents text in the form numbers or in other words, vectors. You will learn to use vectors with kNN feature in Amazon OpenSearch Service.
Module 4 - Fullstack semantic search: You will bring together all the concepts learnt earlier with an user interface that shows the advantages of using semantic search with text search. You will be using Amazon OpenSearch Service, Amazon SageMaker, AWS Lambda, Amazon API Gateway and Amazon S3 for this purpose.
Module 5 - Fine tuning semantic search: Large language models like BERT show better results when they are trained in-domain, which means fine tuning the general model to fit ones particular business requirements in the domain of its application. You will learn how to fine tune the model for semantic search with the chosen data set.
Module 6 - Neural Search: Implement semantic search with OpenSearch Neural Search Plugin.
Module 7 - Retrieval Augmented Generation: Use semantic search result as context, combine the user input and context as prompt for large language models to generate factual content for knowledge intensive applications.
Module 8 - Conversational Search: Search with history context while leveraging RAG.

Please refer Semantic Search Workshop for lab instruction.

Note

In this workshop, we use OpenSearch internal database to store username and password to simplify the lab. However in production env, you should design your security solution per your requirements. For more information , please refer Fine-grained access control and Identity and Access Management.

Feedback

If you have any questions or feedback, please reach us by sending email to semantic-search@amazon.com.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
backend		backend
blog		blog
code		code
frontend		frontend
generative-ai		generative-ai
image		image
model		model
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Module 1 - Difference between BM25 similarity and Semantic similarity.ipynb		Module 1 - Difference between BM25 similarity and Semantic similarity.ipynb
Module 2 - Text Search.ipynb		Module 2 - Text Search.ipynb
Module 3 - Semantic Search.ipynb		Module 3 - Semantic Search.ipynb
Module 4 - Fullstack Semantic Search.ipynb		Module 4 - Fullstack Semantic Search.ipynb
Module 5 - Semantic Search with Fine Tuned Model.ipynb		Module 5 - Semantic Search with Fine Tuned Model.ipynb
Module 6 - Lab1 - Sematic Search with Neural Search Local Model.ipynb		Module 6 - Lab1 - Sematic Search with Neural Search Local Model.ipynb
Module 6 - Lab2 - Sematic Search with Neural Search Remote Model.ipynb		Module 6 - Lab2 - Sematic Search with Neural Search Remote Model.ipynb
Module 7 - Retrieval Augmented Generation.ipynb		Module 7 - Retrieval Augmented Generation.ipynb
Module 8 - Conversational Search.ipynb		Module 8 - Conversational Search.ipynb
README.md		README.md
converstational-search.png		converstational-search.png
convert_pqa.py		convert_pqa.py
deploy-semantic-search-backend-cloudformation.jpg		deploy-semantic-search-backend-cloudformation.jpg
deploy-to-aws.png		deploy-to-aws.png
download-dependencies.sh		download-dependencies.sh
full-stack-semantic-search-ui-2.jpg		full-stack-semantic-search-ui-2.jpg
full-stack-semantic-search-ui.jpg		full-stack-semantic-search-ui.jpg
inference.py		inference.py
keyword_search.png		keyword_search.png
nlp_bert.png		nlp_bert.png
rag.png		rag.png
requirements.txt		requirements.txt
semantic_search.png		semantic_search.png
semantic_search_fullstack.jpg		semantic_search_fullstack.jpg
semantic_search_with_fine_tuning.png		semantic_search_with_fine_tuning.png
word2vec.png		word2vec.png

License

aws-samples/semantic-search-with-amazon-opensearch

Folders and files

Latest commit

History

Repository files navigation

Improve search relevance with machine learning in Amazon OpenSearch Service

How does it work?

Semantic Search Architecture

Rereieval Augmented Generation Architecture

Converational Search Architecture

CloudFormation Deployment

Lab Instruction

Note

Feedback

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages