Skip to content

retkowsky/azure_visual_search_toolkit

Repository files navigation

Visual Search toolkit with Azure Cognitive Search, Sentence Transformers, Azure Computer Vision and bar code/QR code detection


Description

The goal of this is Azure AI asset is to enable search over Text and Images using Azure Cognitive Search. The technique was inspired by a research article which show how to convert vectors (embeddings) to text which allows the Cognitive Search service to leverage the inverted index to quickly find the most relevant items. For this reason, any model that will convert an object to a vector can be leveraged if the number of dimensions in the resulting vector is less than 3,000. It also allows users to leverage existing pretrained or fine-tuned models.

This technique has shown to be incredibly effective and easy to implement. We are using Sentence Transformers, which is an OpenAI clip model wrapper. We need to embed all our existing catalog of images. Then the objects embedding are converted into a set of fake terms and all the results are stored into an Azure Cognitive Search index for handling all the search requests. For example, if an embedding looked like [-0,21, .123, ..., .876], this might be converted to a set of fake terms such as: “A1 B3 … FED0”. This is what is sent as the search query to Azure Cognitive Search.

We can enrich the Azure Cognitive Search index by using extracted text from the images using Azure Read API. We can also detect and extract any information from bar code and/or QR code that might be available in the products catalog images. And we can use also Azure Computer Vision as well to detect the dominant colors of the image, the tags that can describe the image and the caption of each image. All these information will be ingested into the Azure Cognitive Search index.

The goal of this asset is to be able to use the inverted index within Azure Cognitive Search to be able to quickly find vectors stored in the search index that are like a vector provided as part of a search query and/or using any AI extracted information (text, dominant colors, …). Unlike techniques like cosine similarity which are slow to process large numbers of items, this leverages an inverted index which enables much more data to be indexed and searched.

Toolkit Document

Process

  • We have here a collection of catalog images (466 images).
  • For each of these images, we will embed them using Sentence Transformers. Sentence Transformer can be used to map images and texts to the same vector space. As model, we use the OpenAI CLIP Model which was trained on a large set of images and image alt texts.
  • We can retrieve any text from these images using Azure Read API (if any text is available)
  • We can retrieve any text information from any bar code or QR code (if any)
  • All these information will be ingested into an Azure Cognitive Search index
  • Then if you have a field image, you can embed it and extract any text/barcode information and call the Azure Cognitive Search index to retrieve any similar images using vecText similarity and/or using any query text from the extracted text

Field images are available in the field images directory (number of images=53)

Azure products documentation

Research article

https://www.researchgate.net/publication/305910626_Large_Scale_Indexing_and_Searching_Deep_Convolutional_Neural_Network_Features

Directories

  • images: We have two directories (catalog images, field images)
  • model: Directory to save the clusters of the model
  • results: Directory to save some results
  • test: Directory that contains some testing images

Python notebooks

0. Settings.ipynb

Notebook that contains the link to the images and the importation process of the python required libraries

1. Catalog images exploration.ipynb

This notebook will display some catalog and field images

2. OpenAI Clip and VecText Clusters.ipynb

This notebook will explain what sentence transformers is and will generate the clusters This notebook analyzes a set of existing images to determine a set of "cluster centers" that will be used to determine which "fake words" are generated for a vector This notebook will take a test set of files (testSamplesToTest) and determine the optimal way to cluster vectors into fake words that will be indexed into Azure Cognitive Search

3. VecText generation.ipynb

This notebook will generate the vectext embedding for all the catalog images

4. BarCode Information extraction.ipynb

This notebook will detect any barcode or QR code from the catalog images and will extract the information

5. Azure CV for OCR, tags, colors and captions.ipynb

This notebook will use Azure Computer Vision or OCR, colors, tags and caption extraction for each of the catalog images.

6. Azure Cognitive Search Index Generation.ipynb

This notebook will show how to ingest all the information into an Azure Cognitive Search index.

7. Calling Azure Cognitive Search.ipynb

We can now test the index using some images similarity visual search or free text queries using azure Cognitive Search.

Python files

  • azureCognitiveSearch.py This python file contains many functions to manage and use Azure Cognitive Search

  • myfunctions.py This python file contains many generic functions used in all the notebooks

  • vec2Text.py This python file contains some functions for the sentence transformers model



25-Oct-2022 Serge Retkowsky | serge.retkowsky@microsoft.com | https://www.linkedin.com/in/serger/