vision-language-model

Star

Here are 106 public repositories matching this topic...

chs20 / RobustVLM

Star

[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

ai ml clip adversarial-attacks adversarial-defense vision-language-model

Updated May 16, 2024
Python

mtakamichi / ZEN-IQA

Star

Official implementation of our IEEE Access paper (2024), ZEN-IQA: Zero-Shot Explainable and No-Reference Image Quality Assessment with Vision Language Model

pytorch clip iqa image-quality-assessment blind-image-quality-assessment pytorch-implementation nr-iqa vision-language-model

Updated May 16, 2024
Python

reidbarber / webmarker

Star

A library for marking web pages for Set-of-Mark (SoM) prompting with vision-language models.

som prompt prompt-engineering vision-language-model set-of-mark

Updated May 16, 2024
TypeScript

FoundationVision / Groma

Star

Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Updated May 15, 2024
Python

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源模型

image-classification gpt multi-modal semantic-segmentation video-classification mme image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated May 15, 2024
Jupyter Notebook

haotian-liu / LLaVA

Star

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated May 15, 2024
Python

Oztobuzz / Vista

Star

This is the official repository for Vista dataset - A Vietnamese multimodal dataset contains more than 700,000 samples of conversations and images

open-source vietnamese dataset vista vietnamese-nlp multimodal multi-modality vision-language-model

Updated May 14, 2024
Python

QuIIL / TQx

Star

Towards a text-based quantitative and explainable histopathology image analysis (MICCAI 2024)

computational-pathology vision-language-model

Updated May 14, 2024

whwu95 / FreeVA

Star

FreeVA: Offline MLLM as Training-Free Video Assistant

chatbot video-understanding zero-shot-video-captioning video-question-answering chatgpt vision-language-model llava training-free multimodal-large-language-models

Updated May 14, 2024
Python

xuyang-liu16 / VGDiffZero

Star

[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

computer-vision zero-shot-learning vision-language-model

Updated May 13, 2024
Python

llm-jp / awesome-japanese-llm

Star

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated May 13, 2024

HieuPhan33 / CVPR2024_MAVL

Star

Multi-Aspect Vision Language Pretraining - CVPR2024

zero-shot-classification vision-language-pretraining vision-language-model zero-shot-segmentation medical-vision-and-language-pretraining

Updated May 12, 2024
Python

Mamadou-Keita / FIDAVL

Star

[ICPR 2024] The official repo for FIDAVL: Fake Image Detection and Attribution using Vision-Language Model

image-captioning gans image-forensics deepfake diffusion-models soft-prompt-tuning large-language-model vision-language-model vision-question-answering synthetic-image-attribution

Updated May 10, 2024

zhengli97 / PromptKD

Star

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

clip knowledge-distillation multi-modal-learning prompt-learning vision-language-model cvpr2024

Updated May 9, 2024
Python

OpenDriveLab / ELM

Sponsor

Star

Embodied Understanding of Driving Scenarios

autonomous-driving vision-language-model end-to-end-driving

Updated May 9, 2024
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated May 8, 2024
Python

RoboFlamingo / RoboFlamingo

Star

Code for RoboFlamingo

robotics artificial-intelligence vision-language-model

Updated May 8, 2024
Python

huangwl18 / VoxPoser

Star

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

robotics motion-planning robotic-manipulation embodied-ai foundation-models large-language-models vision-language-model

Updated May 8, 2024
Python

Gahyeonkim09 / AAPL

Star

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)

prompt-learning vision-language-model

Updated May 8, 2024
Python

alexander-moore / vlm

Star

Composition of Multimodal Language Models From Scratch

machine-learning ai vlm llm mllm vision-language-model multimodal-large-language-models mmllm

Updated May 7, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-model

Here are 106 public repositories matching this topic...

chs20 / RobustVLM

mtakamichi / ZEN-IQA

reidbarber / webmarker

FoundationVision / Groma

OpenGVLab / InternVL

haotian-liu / LLaVA

Oztobuzz / Vista

QuIIL / TQx

whwu95 / FreeVA

xuyang-liu16 / VGDiffZero

llm-jp / awesome-japanese-llm

HieuPhan33 / CVPR2024_MAVL

Mamadou-Keita / FIDAVL

zhengli97 / PromptKD

OpenDriveLab / ELM

InternLM / InternLM-XComposer

RoboFlamingo / RoboFlamingo

huangwl18 / VoxPoser

Gahyeonkim09 / AAPL

alexander-moore / vlm

Improve this page

Add this topic to your repo