Skip to content

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

Notifications You must be signed in to change notification settings

zchoi/Multi-Modal-Large-Language-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 

Repository files navigation

Multi-modal Large Language Model Collection 🦕

This is a curated list of Multi-modal Large Language Models (MLLM), Multimodal Benchmarks (MMB), Multimodal Instruction Tuning (MMIT), Multimodal In-context Learning (MMIL), Foundation Models (e.g., CLIP families) (FM), and the most popular Parameter-Efficient Tuning methods.

📒Table of Contents

Multi-modal Large Language Models (MLLM)

  • Video-LLaVA: Learning United Visual Representation by Alignment Before Projection [Arxiv 2024/02/12] [Paper] [Code]
    Peking University, Peng Cheng Laboratory, Sun Yat-sen University, Guangzhou, Tencent Data Platform, AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, FarReel Ai Lab

  • Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models [Arxiv 2024/02/12] [Paper] [Code] [Evaluation]
    Stanford, Toyota Research Institute

  • Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models [Arxiv 2024/03/27] [Paper] [Code] [Project Page]
    The Chinese University of Hong Kong, SmartMore

  • InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks [Arxiv 2024/01/15] [Paper] [Code]
    OpenGVLab, Shanghai AI Laboratory, Nanjing University, The University of Hong Kong, The Chinese University of Hong Kong, Tsinghua University, University of Science and Technology of China, SenseTime Research

  • GiT: Towards Generalist Vision Transformer through Universal Language Interface [Arxiv 2024/03/14] [Paper]
    Peking University, Max Planck Institute for Informatics, The Chinese University of Hong Kong Shenzhen, ETH Zurich, The Chinese University of Hong Kong

  • LLaMA: Open and Efficient Foundation Language Models [Arxiv 2023] [Paper] [Github Repo]
    Meta AI

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

  • PEFT: Parameter-Efficient Fine-Tuning [HuggingFace 🤗] [Home Page] [Code]
    PEFT, or Parameter-Efficient Fine-Tuning (PEFT), is a library for efficiently adapting pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model’s parameters.

  • LLaMA Efficient Tuning [Github Repo]
    Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen).

  • LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀[Code]
    Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

  • LLaMA2-Accessory 🚀[Code]
    An Open-source Toolkit for LLM Development

  • LLaMA Factory: Training and Evaluating Large Language Models with Minimal Effort Code]
    Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)

About

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published