11000-Image-Video-caption-data-of-human-action
-
Updated
Apr 18, 2024
11000-Image-Video-caption-data-of-human-action
Character Recognition system using CNN and Streamlit
Contrastive Learning Representations for Images and Text Pairs. Colab implementation of ConVIRT for transfer learning with insufficient data volume.
20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
Windows version of text_extraction(VS2013). This code is the implementation of the method proposed in the paper “Multi-script text extraction from natural scenes” (Gomez & Karatzas) to appear in ICDAR2013 conference.
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
Download flickr8k, flickr30k image caption datasets
10000-Image-caption-data-of-gestures
10000-Image-caption-data-of-vehicles
lmmtoolkit is a toolkit for Multi-Modal Learning
The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral
Raster graphics package for Fōrmulæ, in JavaScript
10100-Image-caption-data-of-human-face
Scan text from an image and convert into speech/audio of desired language.
Image Captioning With MobileNet-LLaMA 3
Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment
PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations
Add a description, image, and links to the image-text topic page so that developers can more easily learn about it.
To associate your repository with the image-text topic, visit your repo's landing page and select "manage topics."