LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
May 19, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
TensorFlow Implementation of "Show, Attend and Tell"
Simple Swift class to provide all the configurations you need to create custom camera view in your app
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Oscar and VinVL
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
This repository explores the variety of techniques and algorithms commonly used in deep learning and the implementation in MATLAB and PYTHON
Complete Assignments for CS231n: Convolutional Neural Networks for Visual Recognition
Meshed-Memory Transformer for Image Captioning. CVPR 2020
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Image Captioning using InceptionV3 and beam search
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
An open-source tool for sequence learning in NLP built on TensorFlow.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Add a description, image, and links to the image-captioning topic page so that developers can more easily learn about it.
To associate your repository with the image-captioning topic, visit your repo's landing page and select "manage topics."