Multi-modal Large Language Model Collection 🦕

This is a curated list of Multi-modal Large Language Models (MLLM), Multimodal Benchmarks (MMB), Multimodal Instruction Tuning (MMIT), Multimodal In-context Learning (MMIL), Foundation Models (e.g., CLIP families) (FM), and the most popular Parameter-Efficient Tuning methods.

Multi-modal Large Language Models (MLLM)

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection [Arxiv 2024/02/12] [Paper] [Code]
Peking University, Peng Cheng Laboratory, Sun Yat-sen University, Guangzhou, Tencent Data Platform, AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, FarReel Ai Lab
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models [Arxiv 2024/02/12] [Paper] [Code] [Evaluation]
Stanford, Toyota Research Institute
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models [Arxiv 2024/03/27] [Paper] [Code] [Project Page]
The Chinese University of Hong Kong, SmartMore
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks [Arxiv 2024/01/15] [Paper] [Code]
OpenGVLab, Shanghai AI Laboratory, Nanjing University, The University of Hong Kong, The Chinese University of Hong Kong, Tsinghua University, University of Science and Technology of China, SenseTime Research
GiT: Towards Generalist Vision Transformer through Universal Language Interface [Arxiv 2024/03/14] [Paper]
Peking University, Max Planck Institute for Informatics, The Chinese University of Hong Kong Shenzhen, ETH Zurich, The Chinese University of Hong Kong
LLaMA: Open and Efficient Foundation Language Models [Arxiv 2023] [Paper] [Github Repo]
Meta AI

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

PEFT: Parameter-Efficient Fine-Tuning [HuggingFace 🤗] [Home Page] [Code]
PEFT, or Parameter-Efficient Fine-Tuning (PEFT), is a library for efficiently adapting pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model’s parameters.
LLaMA Efficient Tuning [Github Repo]
Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen).
LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀[Code]
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
LLaMA2-Accessory 🚀[Code]
An Open-source Toolkit for LLM Development
LLaMA Factory: Training and Evaluating Large Language Models with Minimal Effort Code]
Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Multi-modal Large Language Model Collection 🦕

📒Table of Contents

Multi-modal Large Language Models (MLLM)

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

About

Releases

Packages

zchoi/Multi-Modal-Large-Language-Learning

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Multi-modal Large Language Model Collection 🦕

📒Table of Contents

Multi-modal Large Language Models (MLLM)

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages