🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
-
Updated
Jan 23, 2024 - Python
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Algorithms and Publications on 3D Object Tracking
Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
An official PyTorch implementation of the CRIS paper
Collaborative Diffusion (CVPR 2023)
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval
Add a description, image, and links to the multi-modality topic page so that developers can more easily learn about it.
To associate your repository with the multi-modality topic, visit your repo's landing page and select "manage topics."