Awesome-AI-Papers

This repository is used to collect papers and code in the field of AI. The contents contain the following parts:

Table of Content

NLP
CV
Multimodal
Reinforcement Learning
GNN
Transformer Architecture

  ├─ NLP/  
  │  ├─ Word2Vec/  
  │  ├─ Seq2Seq/           
  │  └─ Pretraining/  
  │    ├─ Large Language Model/          
  │    ├─ LLM Application/ 
  │      ├─ AI Agent/          
  │      ├─ Academic/          
  │      ├─ Code/       
  │      ├─ Financial Application/
  │      ├─ Information Retrieval/  
  │      ├─ Math/     
  │      ├─ Medicine and Law/   
  │      ├─ Recommend System/      
  │      └─ Tool Learning/             
  │    ├─ LLM Technique/ 
  │      ├─ Alignment/          
  │      ├─ Context Length/          
  │      ├─ Corpus/       
  │      ├─ Evaluation/
  │      ├─ Hallucination/  
  │      ├─ Inference/     
  │      ├─ MoE/   
  │      ├─ PEFT/     
  │      ├─ Prompt Learning/   
  │      ├─ RAG/       
  │      └─ Reasoning and Planning/       
  │    ├─ LLM Theory/       
  │    └─ Chinese Model/             
  ├─ CV/  
  │  ├─ CV Application/          
  │  ├─ Contrastive Learning/         
  │  ├─ Foundation Model/ 
  │  ├─ Generative Model (GAN and VAE)/          
  │  ├─ Image Editing/          
  │  ├─ Object Detection/          
  │  ├─ Semantic Segmentation/            
  │  └─ Video/          
  ├─ Multimodal/       
  │  ├─ Audio/          
  │  ├─ BLIP/         
  │  ├─ CLIP/        
  │  ├─ Diffusion Model/   
  │  ├─ Multimodal LLM/          
  │  ├─ Text2Image/          
  │  ├─ Text2Video/            
  │  └─ Survey/           
  │─ Reinforcement Learning/ 
  │─ GNN/ 
  └─ Transformer Architecture/

NLP

1. Word2Vec

Efficient Estimation of Word Representations in Vector Space, Mikolov et al., arxiv 2013. [paper]
Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al., NIPS 2013. [paper]
Distributed representations of sentences and documents, Le and Mikolov, ICML 2014. [paper]
Word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method, Goldberg and Levy, arxiv 2014. [paper]
word2vec Parameter Learning Explained, Rong, arxiv 2014. [paper]
Glove: Global vectors for word representation.，Pennington et al., EMNLP 2014. [paper][code]
fastText: Bag of Tricks for Efficient Text Classification, Joulin et al., arxiv 2016. [paper][code]
ELMo: Deep Contextualized Word Representations, Peters et al., arxiv. 2018. [paper]
BPE: Neural Machine Translation of Rare Words with Subword Units, Sennrich et al., ACL 2016. [paper][code]
Byte-Level BPE: Neural Machine Translation with Byte-Level Subwords, Wang et al., arxiv 2019. [paper][code]

2. Seq2Seq

Generating Sequences With Recurrent Neural Networks, Graves, arxiv 2013. [paper]
Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeruIPS 2014. [paper]
Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., ICLR 2015. [paper][code]
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Cho et al., arxiv 2014. [paper]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Cho et al., arxiv 2014. [paper]
[fairseq][pytorch-seq2seq]

3. Pretraining

Attention Is All You Need, Vaswani et al., NIPS 2017. [paper][code]
GPT: Improving language understanding by generative pre-training, Radford et al., preprint 2018. [paper][code]
GPT-2: Language Models are Unsupervised Multitask Learners, Radford et al., OpenAI blog 2019. [paper][code][llm.c]
GPT-3: Language Models are Few-Shot Learners, Brown et al., NeurIPS 2020. [paper][code][nanoGPT][gpt-fast][modded-nanogpt]
InstructGPT: Training language models to follow instructions with human feedback, Ouyang et al., NeurIPS 2022. [paper][MOSS-RLHF]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., arxiv 2018. [paper][code][BERT-pytorch][bert4torch][bert4keras]
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Liu et al., arxiv 2019. [paper][code][Chinese-BERT-wwm]
What Does BERT Look At_An Analysis of BERT's Attention, Clark et al., arxiv 2019. [paper][code]
DeBERTa: Decoding-enhanced BERT with Disentangled Attention, He et al., ICLR 2021. [paper][code]
DistilBERT: a distilled version of BERT_smaller, faster, cheaper and lighter Sanh et al., arxiv 2019. [paper][code]
BERT Rediscovers the Classical NLP Pipeline, Tenney et al., arxiv 2019. [paper][code]
How to Fine-Tune BERT for Text Classification?, Sun et al., arxiv 2019. [paper][code]
TinyStories: How Small Can Language Models Be and Still Speak Coherent English, Eldan and Li, arxiv 2023. [paper][[code]][phi-2]
[llm-course][intro-llm][llm-cookbook][hugging-llm][generative-ai-for-beginners][awesome-generative-ai-guide][LLMs-from-scratch][llm-action]
[tokenizer_summary][minbpe][tokenizers][tiktoken][SentencePiece]

3.1 Large Language Model

A Survey of Large Language Models, Zhao etal., arxiv 2023. [paper][code][LLMBox][LLMBook-zh][LLMsPracticalGuide]
Efficient Large Language Models: A Survey, Wan et al., arxiv 2023. [paper][code]
Challenges and Applications of Large Language Models, Kaddour et al., arxiv 2023. [paper]
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, Zhou et al., arxiv 2023. [paper]
From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*, Mclntosh et al., arxiv 2023. [paper][AGI-survey]
A Survey of Resource-efficient LLM and Multimodal Foundation Models, Xu et al., arxiv 2024. [paper][code]
Large Language Models: A Survey, Minaee et al., arxiv 2024. [paper]
Anthropic: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
Anthropic: Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
Anthropic: Model Card and Evaluations for Claude Models, Anthropic, 2023. [paper]
Anthropic: The Claude 3 Model Family: Opus, Sonnet, Haiku, Anthropic, 2024. [paper]
BLOOM_A 176B-Parameter Open-Access Multilingual Language Model, BigScience Workshop, arxiv 2022. [paper][code][model]
OPT: Open Pre-trained Transformer Language Models, Zhang et al., arxiv 2022. [paper][code]
Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., arxiv 2022. [paper]
Gopher: Scaling Language Models: Methods, Analysis & Insights from Training Gopher, Rae et al., arxiv 2021. [paper]
GPT-NeoX-20B: An Open-Source Autoregressive Language Model, Black et al., arxiv 2022. [paper][code]
Gemini: A Family of Highly Capable Multimodal Models, Gemini Team, Google, arxiv 2023. [paper][Gemini 1.0][Gemini 1.5][Unofficial Implementation][MiniGemini]
Gemma: Open Models Based on Gemini Research and Technology, Google DeepMind, 2024. [paper][code][google-deepmind/gemma][gemma.cpp][model][paligemma]
GPT-4 Technical Report, OpenAI, arxiv 2023. [paper]
GPT-4V(ision) System Card, OpenAI, OpenAI blog 2023. [paper]
Sparks of Artificial General Intelligence_Early experiments with GPT-4, Bubeck et al., arxiv 2023. [paper]
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision), Yang et al., arxiv 2023. [paper][guidance]
LaMDA: Language Models for Dialog Applications, Thoppilan et al., arxiv 2022. [paper][LaMDA-rlhf-pytorch]
LLaMA: Open and Efficient Foundation Language Models, Touvron et al., arxiv 2023. [paper][code][llama.cpp][ollama][llamafile]
Llama 2: Open Foundation and Fine-Tuned Chat Models, Touvron et al., arxiv 2023. [paper][code][llama-recipes][llama2.c][lit-llama][litgpt]
[llama3][llama3-from-scratch]
TinyLlama: An Open-Source Small Language Model, Zhang et al., arxiv 2024. [paper][code][LiteLlama][MobiLlama]
Stanford Alpaca: An Instruction-following LLaMA Model, Taori et al., Stanford blog 2023. [paper][code][Alpaca-Lora]
Mistral 7B, Jiang et al., arxiv 2023. [paper][code][model][mistral-finetune]
OLMo: Accelerating the Science of Language Models, Groeneveld et al., arxiv 2024. [paper][code][Dolma Dataset]
Minerva: Solving Quantitative Reasoning Problems with Language Models, Lewkowycz et al., arxiv 2022. [paper]
PaLM: Scaling Language Modeling with Pathways, Chowdhery et al., arxiv 2022. [paper][PaLM-pytorch][PaLM-rlhf-pytorch][PaLM]
PaLM 2 Technical Report, Anil et al., arxiv 2023. [paper]
PaLM-E: An Embodied Multimodal Language Model, Driess et al., arxiv 2023. [paper][code]
T5: Exploring the limits of transfer learning with a unified text-to-text transformer, Raffel et al., Journal of Machine Learning Research 2023. [paper][code][t5-pytorch]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, Lewis et al., ACL 2020. [paper][code]
FLAN: Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper][code]
Scaling Flan: Scaling Instruction-Finetuned Language Models, Chung et al., arxiv 2022. [paper][model]
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Dai et al., ACL 2019. [paper][code]
XLNet: Generalized Autoregressive Pretraining for Language Understanding, Yang et al., NeurIPS 2019. [paper][code]
WebGPT: Browser-assisted question-answering with human feedback, Nakano et al., arxiv 2021. [paper][MS-MARCO-Web-Search]
Open Release of Grok-1, xAI, 2024. [blog][code][model][modelscope][hpcai-tech/grok-1][dbrx][Command R+][snowflake-arctic]

3.2 LLM Application

A Watermark for Large Language Models, Kirchenbauer et al., arxiv 2023. [paper][code][markllm]
SeqXGPT: Sentence-Level AI-Generated Text Detection, Wang et al., EMNLP 2023. [paper][code][llm-detect-ai][detect-gpt][fast-detect-gpt]
AlpaGasus: Training A Better Alpaca with Fewer Data, Chen et al., arxiv 2023. [paper][code]
AutoMix: Automatically Mixing Language Models, Madaan et al., arxiv 2023. [paper][code]
ChipNeMo: Domain-Adapted LLMs for Chip Design, Liu et al., arxiv 2023. [paper]
GAIA: A Benchmark for General AI Assistants, Mialon et al., ICLR 2024. [paper][code]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code]
MemGPT: Towards LLMs as Operating Systems, Packer et al., arxiv 2023. [paper][code]
UFO: A UI-Focused Agent for Windows OS Interaction, Zhang et al., arxiv 2024. [paper][code]
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement, Wu et al., ICLR 2024. [paper][code]
AIOS: LLM Agent Operating System, Mei et al., arxiv 2024. [paper][code]
DB-GPT: Empowering Database Interactions with Private Large Language Models, Xue et al., arxiv 2023. [paper][code][DocsGPT][privateGPT][localGPT]
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, Wang et al., ICLR 2024. [paper][code]
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement, Zheng et al., arxiv 2024. [paper][code]
Orca: Progressive Learning from Complex Explanation Traces of GPT-4, Mukherjee et al., arxiv 2023. [paper]
PDFTriage: Question Answering over Long, Structured Documents, Saad-Falcon et al., arxiv 2023. [paper][[code]]
Prompt2Model: Generating Deployable Models from Natural Language Instructions, Viswanathan et al., arxiv 2023. [paper][code]
Shepherd: A Critic for Language Model Generation, Wang et al., arxiv 2023. [paper][code]
Alpaca: A Strong, Replicable Instruction-Following Model, Taori et al., Stanford Blog 2023. [paper][code]
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*, Chiang et al., 2023. [blog]
WizardLM: Empowering Large Language Models to Follow Complex Instructions, Xu et al., ICLR 2024. [paper][code]
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences, Liu et al., KDD 2023. [paper][code][AutoWebGLM][AutoCrawler][gpt-crawler][webllama][gpt-researcher][skyvern][Scrapegraph-ai]
LLM4Decompile: Decompiling Binary Code with Large Language Models, Tan et al., arxiv 2024. [paper] [code]
[ray][dask][TaskingAI][gpt4all][ollama][llama.cpp][dify][bisheng][phidata][guidance]
[awesome-llm-apps]

3.2.1 AI Agent

LLM Powered Autonomous Agents, Lilian Weng, 2023. [blog][LLMAgentPapers][LLM-Agents-Papers][awesome-language-agents][Awesome-Papers-Autonomous-Agent]
A Survey on Large Language Model based Autonomous Agents, Wang et al., [paper][code]
The Rise and Potential of Large Language Model Based Agents: A Survey, Xi et al., arxiv 2023. [paper][code]
Agent AI: Surveying the Horizons of Multimodal Interaction, Durante et al., arxiv 2024. [paper]
Position Paper: Agent AI Towards a Holistic Intelligence, Huang et al., arxiv 2024. [paper]
AgentBench: Evaluating LLMs as Agents, Liu et al., ICLR 2024. [paper][code][OSWorld]
Agents: An Open-source Framework for Autonomous Language Agents, Zhou et al., arxiv 2023. [paper][code]
AutoAgents: A Framework for Automatic Agent Generation, Chen et al., arxiv 2023. [paper][code]
AgentTuning: Enabling Generalized Agent Abilities for LLMs, Zeng et al., arxiv 2023. [paper][code]
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors, Chen et al., ICLR 2024. [paper][code]
AppAgent: Multimodal Agents as Smartphone Users, Zhang et al., arxiv 2023. [paper][code]
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception, Wang et al., arxiv 2024. [paper][code]
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, Li et al., arxiv 2024. [paper][code]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al., arxiv 2023. [paper][code]
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Li et al., NeurIPS 2023. [paper][code]
ChatDev: Communicative Agents for Software Development, Qian et al., ACL 2024. [paper][code][gpt-pilot]
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework, Hong et al., ICLR 2024 Oral. [paper][code]
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation, Luo et al., arxiv 2024. [paper][code]
Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arxiv 2023. [paper][code][GPTeam]
CogAgent: A Visual Language Model for GUI Agents, Hong et al., CVPR 2024. [paper][code]
OpenAgents: An Open Platform for Language Agents in the Wild, Xie et al., arxiv 2023. [paper][code]
TaskWeaver: A Code-First Agent Framework, Qiao et al., arxiv 2023. [paper][code]
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, Fan et al., NeurIPS 2022 Outstanding Paper. [paper][code]
Voyager: An Open-Ended Embodied Agent with Large Language Models, Wang et al., arxiv 2023. [paper][code]
Eureka: Human-Level Reward Design via Coding Large Language Models, Ma et al., ICLR 2024. [paper][code][DrEureka]
Mind2Web: Towards a Generalist Agent for the Web, Deng et al., NeurIPS 2023. [paper][code][AutoWebGLM]
SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded, Zheng et al., arxiv 2024. [paper][code]
Foundation Models in Robotics: Applications, Challenges, and the Future, Firoozi et al., arxiv 2023. [paper][code]
RT-1: Robotics Transformer for Real-World Control at Scale, Brohan et al., arxiv 2022. [paper][code]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, Brohan et al., arxiv 2023. [paper][Unofficial Implementation][RT-H: Action Hierarchies Using Language]
Open X-Embodiment: Robotic Learning Datasets and RT-X Models, Open X-Embodiment Collaboration, arxiv 2023. [paper][code]
Shaping the future of advanced robotics, Google DeepMind 2024. [blog]
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, Wang et al., ICML 2024. [paper][code]
RL-GPT: Integrating Reinforcement Learning and Code-as-policy, Liu et al., arxiv 2024. [paper]
Genie: Generative Interactive Environments, Bruce et al., arxiv 2024. [paper]
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, Fu et al., arxiv 2024. [paper][code][Hardware Code][Learning Code][UMI]
Octo: An Open-Source Generalist Robot Policy, Ghosh et al., arxiv 2024. [paper][code]
[LeRobot][DORA][awesome-ai-agents][IsaacLab]
[AutoGPT][GPT-Engineer][AgentGPT]
[BabyAGI][SuperAGI][OpenAGI]
[open-interpreter][Homepage][rawdog][OpenCodeInterpreter]
XAgent: An Autonomous Agent for Complex Task Solving, [blog][code]
[crewAI][phidata][gpt-computer-assistant]

3.2.2 Academic

Galactica: A Large Language Model for Science, Taylor et al., arxiv 2022. [paper][code]
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization, Deng et al., arxiv 2023. [paper][code][pdf_parser]
GeoGalactica: A Scientific Large Language Model in Geoscience, Lin et al., arxiv 2024. [paper][code][sciparser]
Scientific Large Language Models: A Survey on Biological & Chemical Domains, Zhang et al., arxiv 2024. [paper][code]
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning, Zhang et al., arxiv 2024. [paper][code]
ChemLLM: A Chemical Large Language Model, Zhang et al., arxiv 2024. [paper][model]
LangCell: Language-Cell Pre-training for Cell Identity Understanding, Zhao et al., ICML 2024. [paper][code][scFoundation]
[Awesome-Scientific-Language-Models][gpt_academic][ChatPaper]

3.2.3 Code

Neural code generation, CMU 2024 Spring. [link]
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, Zhang et al., arxiv 2023. [paper][Awesome-Code-LLM][MFTCoder]
Source Code Data Augmentation for Deep Learning: A Survey, Zhuo et al., arxiv 2023. [paper][code]
Codex: Evaluating Large Language Models Trained on Code, Chen et al., arxiv 2021. [paper][human-eval]
Code Llama: Open Foundation Models for Code, Rozière et al., arxiv 2023. [paper][code][model]
CodeGemma: Open Code Models Based on Gemma, [blog][report]
AlphaCode: Competition-Level Code Generation with AlphaCode, Li et al., arxiv 2022. [paper][dataset][AlphaCode2_Tech_Report]
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X, Zheng et al., KDD 2023. [paper][code][CodeGeeX2]
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, Nijkamp et al., ICLR 2022. [paper][code]
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages, Nijkamp et al., ICLR 2023. [paper][code]
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, Le et al., arxiv 2023. [paper][code]
StarCoder: may the source be with you, Li et al., arxiv 2023. [paper][code][bigcode-project][model]
StarCoder 2 and The Stack v2: The Next Generation, Lozhkov et al., 2024. [paper][code][starcoder.cpp]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct, Luo et al., ICLR 2024. [paper][code]
Magicoder: Source Code Is All You Need, Wei et al., arxiv 2023. [paper][code]
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Ridnik et al., arxiv 2024. [paper][code][pr-agent][cover-agent]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence, Guo et al., arxiv 2024. [paper][code]
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Yang et al., arxiv 2024. [paper]
Design2Code: How Far Are We From Automating Front-End Engineering?, Si et al., arxiv 2024. [paper][code]
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct, Lei et al., arxiv 2024. [paper][code]
[CodeQwen1.5][aiXcoder-7B]
[OpenDevin][swe-bench-technical-report][devika][SWE-agent][auto-code-rover][developer]
[screenshot-to-code][vanna]

3.2.4 Financial Application

DocLLM: A layout-aware generative language model for multimodal document understanding, Wang et al., arxiv 2024. [paper]
DocGraphLM: Documental Graph Language Model for Information Extraction, Wang et al., arxiv 2023. [paper]
FinBERT: A Pretrained Language Model for Financial Communications, Yang et al., arxiv 2020. [paper][Wiley paper][code][finBERT][valuesimplex/FinBERT]
FinGPT: Open-Source Financial Large Language Models, Yang et al., IJCAI 2023. [paper][code]
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models, Yang et al., arxiv 2024. [paper][code]
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets, Wang et al., arxiv 2023. [paper][code]
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models, Zhang et al., arxiv 2023. [paper][code]
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, Liu et al., arxiv 2020. [paper][code]
FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning, Liu et al., NeurIPS 2022. [paper][code]
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning, Chen et al., arxiv 2023. [paper][code]
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist, Zhang et al., arxiv 2024. [paper]
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters, Zhang et al., arxiv 2023. [paper][code][PIXIU]
StructGPT: A General Framework for Large Language Model to Reason over Structured Data, Jiang et al., arxiv 2023. [paper][code]
Large Language Model for Table Processing: A Survey, Lu et al., arxiv 2024. [paper][llm-table-survey][table-transformer]
A Survey of Large Language Models in Finance (FinLLMs), Lee et al., arxiv 2024. [paper][code]
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow, Zhang et al., arxiv 2023. [paper][code]
Data Interpreter: An LLM Agent For Data Science, Hong et al., arxiv 2024. [paper][code]
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework, Li et al., COLING 2024. [paper][code]
[gpt-investor][FinGLM]

3.2.5 Information Retrieval

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, Khattab et al., SIGIR 2020. [paper]
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, Santhanam et al., NAACL 2022. [paper][code][RAGatouille]
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval, Louis et al., arxiv 2024. [paper][code][model]
Large Language Models for Information Retrieval: A Survey, Zhu et al., arxiv 2023. [paper][code]
Large Language Models for Generative Information Extraction: A Survey, Xu et al., arxiv 2023. [paper][code][UIE][NERRE]
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models, Li et al., AAAI 2024. [paper]
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Zhu et al., ACL 2024. [paper][code]
GenIR: From Matching to Generation: A Survey on Generative Information Retrieval, Li et al., arxiv 2024. [paper][code]
SIGIR-AP 2023 Tutorial: Recent Advances in Generative Information Retrieval [link]
[search_with_lepton][LLocalSearch][FreeAskInternet][storm][searxng]

3.2.6 Math

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, Gou et al., ICLR 2024. [paper][code]
MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models, Lu et al., ICLR 2024 Oral. [paper][code][[MathBench]https://github.com/open-compass/MathBench\]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Shao et al., arxiv 2024. [paper][code]
Common 7B Language Models Already Possess Strong Math Capabilities, Li et al., arxiv 2024. [paper][code]
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline, Xu et al., arxiv 2024. [paper][code]
AlphaMath Almost Zero: process Supervision without process, Chen et al., arxiv 2024. [paper][code]

3.2.7 Medicine and Law

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge, Zhou et al., arxiv 2023. [paper][code][LLM-for-Healthcare]
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law, Chen et al., arxiv 2024. [paper][code]
HuatuoGPT, towards Taming Language Model to Be a Doctor, Zhang et al., arxiv 2023. [paper][code][Medical_NLP][Zhongjing][MedicalGPT]
ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases, Cui et al., arxiv 2023. [paper][code]
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services, Yue et al., arxiv 2023. [paper][code]
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Bao et al., arxiv 2023. [paper][code]
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning, Tang et al., arxiv 2023. [paper][code]
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, Chen et al., arxiv 2023. [paper][meditron]
Med-PaLM: Large language models encode clinical knowledge, Singhal et al., Nature 2023. [paper][Unofficial Implementation]
Capabilities of Gemini Models in Medicine, Saab et al., arxiv 2024. [paper]
AMIE: Towards Conversational Diagnostic AI, Tu et al., arxiv 2024. [paper][AMIE-pytorch]
Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People, Wang et al., arxiv 2024. [paper][code][Medical_NLP]
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents, Li et al., arxiv 2024. [paper]
[openfold][alphafold3-pytorch][AlphaFold3][LucaOne]

3.2.8 Recommend System

DIN: Deep Interest Network for Click-Through Rate Prediction, Zhou et al., KDD 2018. [paper][code][DIEN][x-deeplearning]
MMoE: Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts, Ma et al., KDD 2018. [paper][DeepCTR-Torch][pytorch-mmoe]
Recommender Systems with Generative Retrieval, Rajput et al., NeurIPS 2022. [paper]
Unifying Large Language Models and Knowledge Graphs: A Roadmap, Pan et al., arxiv 2023. [paper]
YuLan-Rec: User Behavior Simulation with Large Language Model based Agents, Wang et al., arxiv 2023. [paper][code]
SSLRec: A Self-Supervised Learning Framework for Recommendation, Ren et al., WSDM 2024 Oral. [paper][code][Awesome-SSLRec-Papers]
RLMRec: Representation Learning with Large Language Models for Recommendation, Ren et al., WWW 2024. [paper][code]
LLMRec: Large Language Models with Graph Augmentation for Recommendation, Wei et al., WSDM 2024 Oral. [paper][code]
Agent4Rec_On Generative Agents in Recommendation, Zhang et al., arxiv 2023. [paper][code]
LLM-KERec: Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph, Zhao et al., arxiv 2024. [paper]
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Zhai et al., ICML 2024. [paper][code]
Wukong: Towards a Scaling Law for Large-Scale Recommendation, Zhang et al., arxiv 2024. [paper][unofficial code]
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems, Lian et al., arxiv 2024. [paper][code]
[recommenders][Source code for Twitter's Recommendation Algorithm][Awesome-RSPapers][RecBole][RecSysDatasets]

3.2.9 Tool Learning

Tool Learning with Foundation Models, Qin et al., arxiv 2023. [paper][code]
Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al., arxiv 2023. [paper][toolformer-pytorch][toolformer]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al., ICLR 2024 Spotlight. [paper][code][StableToolBench]
Gorilla: Large Language Model Connected with Massive APIs, Patil et al., arxiv 2023. [paper][code]
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction, Yang et al., arxiv 2023. [paper][code]
LLMCompiler: An LLM Compiler for Parallel Function Calling, Kim et al., arxiv 2023. [paper][code]
Large Language Models as Tool Makers, Cai et al, arxiv 2023. [paper][code]
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang et al., arxiv 2023. [paper][code][ToolQA][toolbench]
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search, Zhuang et al., arxiv 2023. [paper][[code]]
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models, Lu et al., NeurIPS 2023. [paper][code]
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios, Ye et al., arxiv 2024. [paper][code]
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Du et al., arxiv 2024. [paper][code]
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error, Wang et al., arxiv 2024. [paper][code]
What Are Tools Anyway? A Survey from the Language Model Perspective, Wang et al., arxiv 2024. [paper]
[ToolLearningPapers][awesome-tool-llm]

3.3 LLM Technique

How to Train Really Large Models on Many GPUs, Lilian Weng, 2021. [blog]
Training great LLMs entirely from ground zero in the wilderness as a startup, Yi Tay, 2024. [blog]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Shoeybi et al., arxiv 2019. [paper][code]
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Rajbhandari et al., arxiv 2019. [paper][DeepSpeed]
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training, Li et al., ICPP 2023. [paper][code]
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, Jiang et al., arxiv 2024. [paper]
A Theory on Adam Instability in Large-Scale Machine Learning, Molybog et al., arxiv 2023. [paper]
Loss Spike in Training Neural Networks, Zhang et al., arxiv 2023. [paper]
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, Biderman et al., arxiv 2023. [paper][code]
Continual Pre-Training of Large Language Models: How to (re)warm your model, Gupta et al., [paper]
FLM-101B: An Open LLM and How to Train It with $100K Budget, Li et al., arxiv 2023. [paper][model]
Instruction Tuning with GPT-4, Peng et al., arxiv 2023. [paper][code]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Khattab et al., arxiv 2023. [paper][code]
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning, Ye et al., arxiv 2024. [paper][code]
A Survey on Self-Evolution of Large Language Models, Tao et al., arxiv 2024. [paper][code]

3.3.1 Alignment

AI Alignment: A Comprehensive Survey, Ji et al., arxiv 2023. [paper][PKU-Alignment]
Large Language Model Alignment: A Survey, Shen et al., arxiv 2023. [paper]
Aligning Large Language Models with Human: A Survey, Wang et al., arxiv 2023. [paper][code]
[alignment-handbook]
Self-Instruct: Aligning Language Models with Self-Generated Instructions, Wang et al., ACL 2023. [paper][code]
RLHF: [hf blog][OpenAI blog][alignment blog][awesome-RLHF]
Secrets of RLHF in Large Language Models [MOSS-RLHF][Part I][Part II]
Safe RLHF: Safe Reinforcement Learning from Human Feedback, Dai et al., ICLR 2024 Spotlight. [paper][code]
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization, Huang et al., arxiv 2024. [paper][code][blog][trl]
RLHF Workflow: From Reward Modeling to Online RLHF, Dong et al., arxiv 2024. [paper][code]
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework, Hu et al., arxiv 2024. [paper][code]
LIMA: Less Is More for Alignment, Zhou et al., NeurIPS 2023. [paper]
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., NeurIPS 2023 Runner-up Award. [paper][Unofficial Implementation][trl][dpo_trainer]
BPO: Black-Box Prompt Optimization: Aligning Large Language Models without Model Training, Cheng et al., arxiv 2023. [paper][code]
KTO: Model Alignment as Prospect Theoretic Optimization, Ethayarajh et al., arxiv 2024. [paper][code]
SimPO: Simple Preference Optimization with a Reference-Free Reward, Meng et al., arxiv 2024. [paper][code]
Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Lee et al., arxiv 2023. [paper][[code]][awesome-RLAIF]
Direct Language Model Alignment from Online AI Feedback, Guo et al., arxiv 2024. [paper]
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models, Li et al., arxiv 2023. [paper][code][policy_optimization]
Zephyr: Direct Distillation of LM Alignment, Tunstall et al., arxiv 2023. [paper][code]
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, Burns et al., arxiv 2023. [paper][code]
SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, Chen et al., arxiv 2024. [paper][code][unofficial implementation]
SPPO: Self-Play Preference Optimization for Language Model Alignment, Wu et al., arxiv 2024. [paper]
CALM: LLM Augmented LLMs: Expanding Capabilities through Composition, Bansal et al., arxiv 2024. [paper][CALM-pytorch]
Self-Rewarding Language Models, Yuan et al., arxiv 2024. [paper][unofficial implementation]
Anthropic: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Hubinger et al., arxiv 2024. [paper]
LongAlign: A Recipe for Long Context Alignment of Large Language Models, Bai et al., arxiv 2024. [paper][code]
Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction, Ji et al., arxiv 2024. [paper][code]
A Survey on Knowledge Distillation of Large Language Models, Xu et al., arxiv 2024. [paper][code]
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment, Shen et al., arxiv 2024. [paper][code]
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Ni et al., arxiv 2024. [paper][code]

3.3.2 Context Length

ALiBi: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, Press et al., ICLR 2022. [paper][code]
Positional Interpolation: Extending Context Window of Large Language Models via Positional Interpolation, Chen et al., arxiv 2023. [paper]
Scaling Transformer to 1M tokens and beyond with RMT, Bulatov et al., AAAI 2024. [paper][code][LM-RMT]
LongNet: Scaling Transformers to 1,000,000,000 Tokens, Ding et al., arxiv 2023. [paper][code][unofficial code]
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models, Chen et al., ICLR 2024 Oral. [paper][code]
StreamingLLM: Efficient Streaming Language Models with Attention Sinks, Xiao et al., ICLR 2024. [paper][code][SwiftInfer][SwiftInfer blog]
YaRN: Efficient Context Window Extension of Large Language Models, Peng et al., ICLR 2024. [paper][code]
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression, Jiang et al., arxiv 2023. [paper][code]
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Ding et al., arxiv 2024. [paper][code]
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, Jin et al., arxiv 2024. [paper][code]
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey, Pawar et al., arxiv 2024. [paper]
Data Engineering for Scaling Language Models to 128K Context, Fu et al., arxiv 2024. [paper][code]
CEPE: Long-Context Language Modeling with Parallel Context Encoding, Yen et al., arxiv 2024. [paper][code]
Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models, Song et al., arxiv 2024. [paper][code]
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arxiv 2024. [paper][infini-transformer-pytorch][InfiniTransformer][infini-mini-transformer][megalodon]
Extending Llama-3's Context Ten-Fold Overnight, Zhang et al., arxiv 2024. [paper][code][activation_beacon]
Make Your LLM Fully Utilize the Context, An et al., arxiv 2024. [paper][code]

3.3.3 Corpus

[datatrove][datasets][doccano]
C4: Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus, Dodge et al., arxiv 2021. [paper][dataset]
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, Laurençon et al., NeurIPS 2023. [paper][code][dataset]
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only, Penedo et al., arxiv 2023. [paper][dataset]
Data-Juicer: A One-Stop Data Processing System for Large Language Models, Chen et al., arxiv 2023. [paper][code]
UltraFeedback: Boosting Language Models with High-quality Feedback, Cui et al., ICML 2024. [paper][code]
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code]
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset, Qiu et al., arxiv 2024. [paper][dataset][LabelLLM][labelU]
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research, Soldaini et al., arxiv 2024. [paper][code][OLMo]
Datasets for Large Language Models: A Comprehensive Survey, Liu et al., arxiv 2024. [paper][Awesome-LLMs-Datasets]
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows, Patel et al., arxiv 2024. [paper][code]
Large Language Models for Data Annotation: A Survey, Tan et al., arxiv 2024. [paper][code]
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance, Ye et al., arxiv 2024. [paper][code]
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning, Bai et al., arxiv 2024. [paper][dataset]
Best Practices and Lessons Learned on Synthetic Data for Language Models, Liu et al., arxiv 2024. [paper]
FineWeb: decanting the web for the finest text data at scale, HuggingFace, 2024. [blogpost][fineweb][fineweb-edu]

3.3.4 Evaluation

[Awesome-LLM-Eval][LLM-eval-survey]
MMLU: Measuring Massive Multitask Language Understanding, Hendrycks et al., ICLR 2021. [paper][code]
HELM: Holistic Evaluation of Language Models, Liang et al., arxiv 2022. [paper][code]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Zheng et al., arxiv 2023. [paper][code]
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark, Xu et al., arxiv 2023. [paper][code][SuperCLUE-RAG]
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models, Huang et al., NeurIPS 2023. [paper][code][chinese-llm-benchmark]
CMMLU: Measuring massive multitask language understanding in Chinese, Li et al., arxiv 2023. [paper][code]
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark, Zhang et al., arxiv 2024. [paper][code]
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference, Chiang et al., arxiv 2024. [paper][demo]
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models, Kim et al., arxiv 2024. [paper][code]
[Open LLM Leaderboard]
[AlpacaEval Leaderboard][alpaca_eval]
[Chatbot-Arena-Leaderboard][blog][FastChat][arena-hard]
[lm-evaluation-harness][OpenAI Evals][simple-evals]
[OpenCompass]
[llm-colosseum]

3.3.5 Hallucination

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models, Zhang et al., arxiv 2023. [paper][code]
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Huang et al., arxiv 2023. [paper][code][Awesome-MLLM-Hallucination]
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, Li et al., arxiv 2024. [paper][code]
Chain-of-Verification Reduces Hallucination in Large Language Models, Dhuliawala et al., arxiv 2023. [[paper](https://arxiv.org/abs/2309.11495)]\[[code](https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/Chain-of-Verification)\]
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models, Guan et al., CVPR 2024. [paper][code]
Woodpecker: Hallucination Correction for Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
TrustLLM: Trustworthiness in Large Language Models, Sun et al., arxiv 2024. [paper][code]
SAFE: Long-form factuality in large language models, Wei et al., arxiv 2024. [paper][code]

3.3.6 Inference

How to make LLMs go fast, 2023. [blog]
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, Miao et al., arxiv 2023. [paper][Awesome-Quantization-Papers][awesome-model-quantization]
Full Stack Optimization of Transformer Inference: a Survey, Kim et al., arxiv 2024. [paper]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, Dettmers et al., NeurIPS 2022. [paper][code]
LLM-FP4: 4-Bit Floating-Point Quantized Transformers, Liu et al., arxiv 2023. [paper][code]
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models, Shao et al., ICLR 2024 Spotlight. [paper][code]
BitNet: Scaling 1-bit Transformers for Large Language Models, Wang et al., arxiv 2023. [paper][code][unofficial implementation][BiLLM]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Frantar et al., ICLR 2023. [paper][code][AutoGPTQ]
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models, Frantar et al., arxiv 2023. [paper][code]
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Lin et al., arxiv 2023. [paper][code][AutoAWQ][qserve]
LLM in a flash: Efficient Large Language Model Inference with Limited Memory, Alizadeh et al., arxiv 2023. [paper][air_llm]
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models, Jiang et al., arxiv 2023. [paper][code]
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU, Sheng et al., ICML 2023. [paper][code]
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU, Song et al., arxiv 2023. [paper][code][llama.cpp][Anima]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, Dao et al., NeurIPS 2022. [paper][code]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning, Tri Dao, arxiv 2023. [paper][code]
vllm: Efficient Memory Management for Large Language Model Serving with PagedAttention, Kwon et al., arxiv 2023. [paper][code]
Fast and Expressive LLM Inference with RadixAttention and SGLang, Zheng et al., Stanford blog 2024. [blog][code]
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads, Cai et al., arxiv 2024. [paper][code]
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, Li et al., ICML 2024. [paper][code]
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding, Liu et al., arxiv 2024. [paper][[code]][Ouroboros]
CLLMs: Consistency Large Language Models, Kou et al., ICML 2024. [paper][code][LookaheadDecoding]
[TensorRT-LLM][FasterTransformer][TritonServer][GenerativeAIExamples][TensorRT-Model-Optimizer]
[DeepSpeed-MII][DeepSpeed-FastGen][ONNX Runtime][onnx]
[text-generation-inference][quantization][quanto]
[OpenLLM][mlc-llm]
[LMDeploy]
[ggml][exllamav2][llama.cpp][gpt-fast][fastllm][CTranslate2][ipex-llm][rtp-llm][KsanaLLM]
[ChuanhuChatGPT][ChatGPT-Next-Web][OpenLLM]

3.3.7 MoE

Mixture of Experts Explained, Sanseviero et al., Hugging Face Blog 2023. [blog]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Shazeer et al., arxiv 2017. [paper][Re-Implementation]
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Lepikhin et al., arxiv 2020. [paper][mixture-of-experts]
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Gale et al., arxiv 2022. [paper][code]
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models, Shen et al., arxiv 2023. [paper][[code]]
Fast Inference of Mixture-of-Experts Language Models with Offloading, Eliseev and Mazur, arxiv 2023. [paper][code]
Mixtral-8×7B: Mixtral of Experts, Jiang et al., arxiv 2023. [paper][code][megablocks-public][model][blog][Chinese-Mixtral-8x7B][Chinese-Mixtral]
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, Dai et al., arxiv 2024. [paper][code]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, DeepSeek-AI, arxiv 2024. [paper][code]
Evolutionary Optimization of Model Merging Recipes, Akiba et al., arxiv 2024. [paper][code]
[llama-moe][Aurora][OpenMoE][makeMoE]

3.3.8 PEFT (Parameter-efficient Fine-tuning)

[DeepSpeed][DeepSpeedExamples][blog]
[Megatron-LM][NeMo][Megatron-DeepSpeed][Megatron-DeepSpeed]
[torchtune][torchtitan]
[PEFT][trl][accelerate][LLaMA-Factory][LMFlow][xtuner][MFTCoder][llm-foundry][swift]
[mergekit][Model Merging][OpenChatKit]
LoRA: Low-Rank Adaptation of Large Language Models, Hu et al., arxiv 2021. [paper][code][LoRA From Scratch][lora][dora][MoRA]
QLoRA: Efficient Finetuning of Quantized LLMs, Dettmers et al., NeurIPS 2023 Oral. [paper][code][bitsandbytes][unsloth]
S-LoRA: Serving Thousands of Concurrent LoRA Adapters, Sheng et al., arxiv 2023. [paper][code]
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, Zhao et al., arxiv 2024. [paper][code]
Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL 2021. [paper][code]
Adapter: Parameter-Efficient Transfer Learning for NLP, Houlsby et al., ICML 2019. [paper][code][unify-parameter-efficient-tuning]
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning, Poth et al., EMNLP 2023. [paper][code]
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models, Hu et al., EMNLP 2023. [paper][code]
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, Zhang et al., ICLR 2024. [paper][code]
LLaMA Pro: Progressive LLaMA with Block Expansion, Wu et al., arxiv 2024. [paper][code]
P-Tuning: GPT Understands, Too, Liu et al., arxiv 2021. [paper][code]
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Liu et al., ACL 2022. [paper][code]
Towards a Unified View of Parameter-Efficient Transfer Learning, He et al., ICLR 2022. [paper][code]
Mixed Precision Training, Micikevicius et al., ICLR 2018. [paper]
8-bit Optimizers via Block-wise Quantization Dettmers et al., ICLR 2022. [paper][code]
FP8-LM: Training FP8 Large Language Models Peng et al., arxiv 2023. [paper][code]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, Han et al., arxiv 2024. [paper]
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning, Pan et al., arxiv 2024. [paper][code]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, Zheng et al., arxiv 2024. [paper][code]
ReFT: Representation Finetuning for Language Models, Wu et al., arxiv 2024. [paper][code]

3.3.9 Prompt Learning

OpenPrompt: An Open-source Framework for Prompt-learning, Ding et al., arxiv 2021. [paper][code]
Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning, Su et al., arxiv 2022. [paper]
Large Language Models Are Human-Level Prompt Engineers, Zhou et al., ICLR 2023. [paper][code]
Large Language Models as Optimizers, Yang et al., arxiv 2023. [paper][code]
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4, Bsharat et al., arxiv 2023. [paper][code]
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding, Suzgun and Kalai, arxiv 2024. [paper][code]
AutoPrompt: Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases, Levi et al., arxiv 2024. [paper][code][automatic_prompt_engineer][appl][sammo]
[PromptPapers][ChatGPT Prompt Engineering for Developers][Prompt Engineering Guide][k12promptguide][gpt-prompt-engineer][awesome-chatgpt-prompts][awesome-chatgpt-prompts-zh]
The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP 2021. [paper][code][soft-prompt-tuning][Prompt-Tuning]
A Survey on In-context Learning, Dong et al., arxiv 2023. [paper][code]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work, Min et al., EMNLP 2022. [paper][code]
Larger language models do in-context learning differently, Wei et al., arxiv 2023. [paper]
PAL: Program-aided Language Models, Gao et al., ICML 2023. [paper][code]
A Comprehensive Survey on Instruction Following, Lou et al., arxiv 2023. [paper][code]
RLHF: Fine-Tuning Language Models from Human Preferences, Ziegler et al., arxiv 2019. [paper][code]
RLHF: Learning to summarize from human feedback, Stiennon et al., NeurIPS 2020. [paper][code]
RLHF: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper]
Instruction Tuning for Large Language Models: A Survey, Zhang et al., arxiv 2023. [paper][code]
What learning algorithm is in-context learning? Investigations with linear models, Akyürek et al., ICLR 2023. [paper]
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers, Dai et al., arxiv 2022. [paper][code]

3.3.10 RAG (Retrieval Augmented Generation)

Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arxiv 2023. [paper][code]
Retrieval-Augmented Generation for AI-Generated Content: A Survey, Zhao et al., arxiv 2024. [paper][code]
A Survey on Retrieval-Augmented Text Generation for Large Language Models, Huang et al., arxiv 2024. [paper]
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, Hu et al., arxiv 2024. [paper][code]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., NeurIPS 2020. [paper][code][model][docs][FAISS]
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Asai et al., ICLR 2024 Oral. [paper][code][CRAG]
Dense Passage Retrieval for Open-Domain Question Answering, Karpukhin et al., EMNLP 2020. [paper][code]
Internet-Augmented Dialogue Generation Komeili et al., arxiv 2021. [paper]
RETRO: Improving language models by retrieving from trillions of tokens, Borgeaud et al., arxiv 2021. [paper][RETRO-pytorch]
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Vu et al., arxiv 2023. [paper][code]
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Yu et al., arxiv 2023. [paper]
Learning to Filter Context for Retrieval-Augmented Generation, Wang et al., arxiv 2023. [paper][code]
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, Sarthi et al., ICLR 2024. [paper][code][tree2retriever][GoMate]
When Large Language Models Meet Vector Databases: A Survey, Jing et al., arxiv 2024. [paper]
RAFT: Adapting Language Model to Domain Specific RAG, Zhang et al., arxiv 2024. [paper][code]
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback, Liu et al., arxiv 2024. [paper][code]
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, Chan et al., arxiv 2024. [paper][code][Adaptive-RAG]
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jin et al., arxiv 2024. [paper][code]
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, Gutiérrez et al., arxiv 2024. [paper][code]
ACL 2023 Tutorial: Retrieval-based Language Models and Applications, Asai et al., ACL 2023. [link]
[Advanced RAG Techniques: an Illustrated Overview][Chinese Version]
[LangChain][blog]
[LlamaIndex][A Cheat Sheet and Some Recipes For Building Advanced RAG]
[chatgpt-retrieval-plugin]
[haystack][Langchain-Chatchat]
Browse the web with GPT-4V and Vimium [vimGPT]
[QAnything][ragflow][fastRAG][anything-llm][FastGPT]
[trt-llm-rag-windows][history_rag][gpt-crawler][R2R][rag-notebook-to-microservices][MaxKB][Verba][cognita]

Text Embedding

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers et al., EMNLP 2019. [paper][code][model][model][vec2text]
SimCSE: Simple Contrastive Learning of Sentence Embeddings, Gao et al., EMNLP 2021. [paper][code]
OpenAI: Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arxiv 2022. [paper][blog]
MRL: Matryoshka Representation Learning, Kusupati et al., arxiv 2022. [paper][code]
BGE: C-Pack: Packaged Resources To Advance General Chinese Embedding, Xiao et al., arxiv 2023. [paper][code][FlagEmbedding]
LLM-Embedder: Retrieve Anything To Augment Large Language Models, Zhang et al., arxiv 2023. [paper][code][FlagEmbedding]
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Chen et al., arxiv 2024. [paper][code][FlagEmbedding]
[m3e-base]
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, Günther et al., arxiv 2023. [paper][model]
GTE: Towards General Text Embeddings with Multi-stage Contrastive Learning, Li et al., arxiv 2023. [paper][model]
[BCEmbedding][bce-embedding-base_v1][bce-reranker-base_v1]
[CohereV3]
One Embedder, Any Task: Instruction-Finetuned Text Embeddings, Su et al., ACL 2023. [paper][code]
E5: Improving Text Embeddings with Large Language Models, Wang et al., arxiv 2024. [paper][code][model][llm2vec]
Nomic Embed: Training a Reproducible Long Context Text Embedder, Nussbaum et al., Nomic AI 2024. [paper][code]
GritLM: Generative Representational Instruction Tuning, Muennighoff et al., arxiv 2024. [paper][code]

3.3.11 Reasoning and Planning

Few-Shot-CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al., NeurIPS 2022. [paper][chain-of-thought-hub]
Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR 2023. [paper]
Zero-Shot-CoT: Large Language Models are Zero-Shot Reasoners, Kojima et al., NeurIPS 2022. [paper][code]
Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models, Zhang et al., ICLR 2023. [paper][code]
Multimodal Chain-of-Thought Reasoning in Language Models, Zhang et al., arxiv 2023. [paper][code]
Chain-of-Thought Reasoning Without Prompting, Wang et al., arxiv 2024. [paper]
ReAct: Synergizing Reasoning and Acting in Language Models, Yao et al., ICLR 2023. [paper][code]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al., arxiv 2023. [paper][code]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., NeurIPS 2023. [paper][code][Plug in and Play Implementation][tree-of-thought-prompting]
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, Besta et al., arxiv 2023. [paper][code]
Cumulative Reasoning with Large Language Models, Zhang et al., arxiv 2023. [paper][code]
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Sel et al., arxiv 2023. [paper][unofficial code]
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation, Ding et al., arxiv 2023. [paper][code]
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models, Ye et al., arxiv 2024. [paper][code]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, Zhou et al., ICLR 2023. [paper]
DEPS: Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, Wang et al., arxiv 2023. [paper][code]
RAP: Reasoning with Language Model is Planning with World Model, Hao et al., arxiv 2023. [paper][code]
LEMA: Learning From Mistakes Makes LLM Better Reasoner, An et al., arxiv 2023. [paper][code]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, Chen et al., TMLR 2023. [paper][code]
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Li et al., arxiv 2023. [paper][[code]]
The Impact of Reasoning Step Length on Large Language Models, Jin et al., arxiv 2024. [paper][code]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, Wang et al., ACL 2023. [paper][code][maestro]
Improving Factuality and Reasoning in Language Models through Multiagent Debate, Du et al., arxiv 2023. [paper][code][Multi-Agents-Debate]
Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., arxiv 2023. [paper][code]
Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al., NeurIPS 2023. [paper][code]
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, Gou et al., ICLR 2024. [paper][code]
Self-Discover: Large Language Models Self-Compose Reasoning Structures, Zhou et al., arxiv 2024. [paper][unofficial implementation][SELF-DISCOVER]
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation, Wang et al., arxiv 2024. [paper][code]
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents, Zhu et al., arxiv 2024. [paper][code][KnowLM]
Advancing LLM Reasoning Generalists with Preference Trees, Yuan et al., arxiv 2024. [paper][code]
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, Yang et al., arxiv 2024. [paper][code][SymbCoT]
ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models, Singh et al., arxiv 2023. [paper][unofficial code]
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent, Aksitov et al., arxiv 2023. [paper][[code]]
Orca 2: Teaching Small Language Models How to Reason, Mitra et al., arxiv 2023. [paper][[code]]
Searchformer: Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping*, Lehnert et al., arxiv 2024. [paper]
How Far Are We from Intelligent Visual Deductive Reasoning?, Zhang et al., arxiv 2024. [paper][code]
[llm-reasoners]

Survey

[Prompt4ReasoningPapers]

3.4 LLM Theory

Scaling Laws for Neural Language Models, Kaplan et al., arxiv 2020. [paper][unofficial code]
Emergent Abilities of Large Language Models, Wei et al., TMRL 2022. [paper]
Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., arxiv 2022. [paper]
Scaling Laws for Autoregressive Generative Modeling, Henighan et al., arxiv 2020. [paper]
Are Emergent Abilities of Large Language Models a Mirage, Schaeffer et al., NeurIPS 2023 Outstanding Paper. [paper]
Understanding Emergent Abilities of Language Models from the Loss Perspective, Du et al., arxiv 2024. [paper]
S2A: System 2 Attention (is something you might need too), Weston et al., arxiv 2023. [paper]
Scaling Laws for Downstream Task Performance of Large Language Models, Isik et al., arxiv 2024. [paper]
Scalable Pre-training of Large Autoregressive Image Models, El-Nouby et al., arxiv 2024. [paper][code]
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, Zhang et al., ICLR 2024. [paper]
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws, Allen-Zhu et al, arxiv 2024. [paper]
Language Modeling Is Compression, Delétang et al., arxiv 2023. [paper]
Language Models Represent Space and Time, Gurnee and Tegmark, ICLR 2024. [paper][code]
The Platonic Representation Hypothesis, Huh et al., arxiv 2024. [paper][code]
Observational Scaling Laws and the Predictability of Language Model Performance, Ruan et al., arxiv 2024. [paper][code]
Language models can explain neurons in language models, OpenAI, 2023. [blog][code][transformer-debugger]
Scaling and evaluating sparse autoencoders, Gao et al., arxiv 2024. [OpenAI Blog][paper][code]
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic, 2023. [blog]
Mapping the Mind of a Large Language Model, Anthropic, 2024. [blog]
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era, Wu et al., arxiv 2024. [paper][code]
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models, Tufanov et al., arxiv 2024. [paper][code]
ROME: Locating and Editing Factual Associations in GPT, Meng et al., NeurIPS 2022. [paper][code][FastEdit]
Editing Large Language Models: Problems, Methods, and Opportunities, Yao et al., EMNLP 2023. [paper][code]
A Comprehensive Study of Knowledge Editing for Large Language Models, Zhang et al., arxiv 2024. [paper][code]

3.5 Chinese Model

[Awesome-Chinese-LLM][awesome-LLMs-In-China]
GLM: General Language Model Pretraining with Autoregressive Blank Infilling, Du et al., ACL 2022. [paper][code][ChatGLM3][GLM-4][AgentTuning]
GLM-130B: An Open Bilingual Pre-trained Model, Zeng et al., ICLR 2023. [paper][code]
Baichuan 2: Open Large-scale Language Models, Yang et al., arxiv 2023. [paper][code]
Qwen Technical Report, Bai et al., arxiv 2023. [paper][code][Qwen2][Qwen-Agent]
Yi: Open Foundation Models by 01.AI, Young et al., arxiv 2024. [paper][code][Yi-1.5]
InternLM2 Technical Report, Cai et al., arxiv 2024. [paper][code]
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, Bi et al., arxiv 2024. [paper][DeepSeek-LLM][DeepSeek-Coder)]
TeleChat Technical Report, Wang et al., arxiv 2024. [paper][code][Tele-FLM]
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca, Cui et al._, arxiv 2023. [paper][code][Chinese-LLaMA-Alpaca-2][Chinese-LLaMA-Alpaca-3][baby-llama2-chinese]
Rethinking Optimization and Architecture for Tiny Language Models, Tang et al., arxiv 2024. [paper][code]
[MOSS][MOSS-RLHF]
[MiniCPM][Skywork][Skywork-MoE][Orion][BELLE][Yuan-2.0][Yuan2.0-M32][Fengshenbang-LM]
[LlamaFamily/Llama-Chinese][LinkSoul-AI/Chinese-Llama-2-7b][llama3-Chinese-chat][phi3-Chinese]
[Firefly][GPT2-chitchat]
Alpaca-CoT: An Empirical Study of Instruction-tuning Large Language Models in Chinese, Si et al., arxiv 2023. [paper][code]

CV

CS231n: Deep Learning for Computer Vision [link]

1. Basic for CV

AlexNet: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012. [paper]
VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., ICLR 2015. [paper]
GoogLeNet: Going Deeper with Convolutions, Szegedy et al., CVPR 2015. [paper]
ResNet: Deep Residual Learning for Image Recognition, He et al., CVPR 2016 Best Paper. [paper][code]
DenseNet: Densely Connected Convolutional Networks, Huang et al., CVPR 2017 Oral. [paper][code]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan et al., ICML 2019. [paper][code][EfficientNet-PyTorch]
BYOL: Bootstrap your own latent: A new approach to self-supervised Learning, Grill et al., arxiv 2020. [paper][code][byol-pytorch]

2. Contrastive Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, He et al., CVPR 2020. [paper][code]
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., PMLR 2020. [paper][code]
DINOv2: Learning Robust Visual Features without Supervision, Oquab et al., arxiv 2023. [paper][code]
FeatUp: A Model-Agnostic Framework for Features at Any Resolution, Fu et al., ICLR 2024. [paper][code]
InfoNCE Loss: Representation Learning with Contrastive Predictive Coding, Oord et al., arxiv 2018. [paper][unofficial code]

3. CV Application

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall et al., ECCV 2020. [paper][code][nerf-pytorch][NeRF-Factory]
GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior, Wang et al., CVPR 2021. [paper][code]
CodeFormer: Towards Robust Blind Face Restoration with Codebook Lookup Transformer, Zhou et al., NeurIPS 2022. [paper][code][APISR]
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers, Li et al., ECCV 2022. [paper][code][occupancy_networks][VoxFormer][TPVFormer]
UniAD: Planning-oriented Autonomous Driving, Hu et al., CVPR 2023 Best Paper. [paper][code]
Nougat: Neural Optical Understanding for Academic Documents, Blecher et al., arxiv 2023. [paper][code][marker]
FaceChain: A Playground for Identity-Preserving Portrait Generation, Liu et al., arxiv 2023. [paper][code]
MGIE: Guiding Instruction-based Image Editing via Multimodal Large Language Models, Fu et al., ICLR 2024 Spotlight. [paper][code]
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding, Li et al., arxiv 2023. [paper][code][AnyDoor]
InstantID: Zero-shot Identity-Preserving Generation in Seconds, Wang et al., arxiv 2024. [paper][code][InstantStyle][ID-Animator][ConsistentID]
ReplaceAnything as you want: Ultra-high quality content replacement, [link][IDM-VTON]
LayerDiffusion: Transparent Image Layer Diffusion using Latent Transparency, Zhang et al., arxiv 2024. [paper][code][sd-forge-layerdiffusion][IC-Light]
[deepfakes/faceswap][DeepFaceLab][DeepFaceLive][deepface]
[IOPaint][SPADE][EasyOCR]
[MuseV][ToonCrafter]

4. Foundation Model

ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al., ICLR 2021. [paper][code][Pytorch Implementation][efficientvit][EfficientFormer][ViT-Adapter]
ViT-Adapter: Vision Transformer Adapter for Dense Predictions, Chen et al., ICLR 2023 Spotlight. [paper][code]
Vision Transformers Need Registers, Darcet et al., ICLR 2024 Outstanding Paper. [paper]
DeiT: Training data-efficient image transformers & distillation through attention, Touvron et al., ICML 2021. [paper][code]
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision, Kim et al., ICML 2021. [paper][code]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu et al., ICCV 2021. [paper][code]
MAE: Masked Autoencoders Are Scalable Vision Learners, He et al., CVPR 2022. [paper][code]
LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models, Bai et al., arxiv 2023. [paper][code]
GLEE: General Object Foundation Model for Images and Videos at Scale, Wu wt al., CVPR 2024. [paper][code]
Tokenize Anything via Prompting, Pan et al., arxiv 2023. [paper][code]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Zhu et al., arxiv 2024. [paper][code][VMamba][mambaout]
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Yang et al., arxiv 2024. [paper][code]
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, Guo et al., arxiv 2024. [paper][code]
[pytorch-image-models][Pointcept]

5. Generative Model (GAN and VAE)

GAN: Generative Adversarial Networks, Goodfellow et al., arxiv 2014. [paper][code][Pytorch-GAN]
StyleGAN3: Alias-Free Generative Adversarial Networks, Karras etal., NeurIPS 2021. [paper][code]
GigaGAN: Scaling up GANs for Text-to-Image Synthesis, Kang et al., arxiv 2023. [paper][code]
[pytorch-CycleGAN-and-pix2pix][img2img-turbo]
VAE: Auto-Encoding Variational Bayes, Kingma et al., arxiv 2013. [paper][code][Pytorch-VAE]
VQ-VAE: Neural Discrete Representation Learning, Oord et al., NIPS 2017. [paper][code][vector-quantize-pytorch]
VQ-VAE-2: Generating Diverse High-Fidelity Images with VQ-VAE-2, Razavi et al., arxiv 2019. [paper][code]
VQGAN: Taming Transformers for High-Resolution Image Synthesis, Esser et al., CVPR 2021. [paper][code]
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, Tian et al., arxiv 2024. [paper][code]

6. Image Editing

InstructPix2Pix: Learning to Follow Image Editing Instructions, Brooks et al., CVPR 2023 Highlight. [paper][code]
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, Pan et al., SIGGRAPH 2023. [paper][code]
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing, Shi et al., arxiv 2023. [paper][code]
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, Mou et al., ICLR 2024 Spolight. [paper][code]
LEDITS++: Limitless Image Editing using Text-to-Image Models, Brack et al., arxiv 2023. [paper][code][demo]
Diffusion Model-Based Image Editing: A Survey, Huang et al., arxiv 2024. [paper][code]

7. Object Detection

DETR: End-to-End Object Detection with Transformers, Carion et al., arxiv 2020. [paper][code]
Focus-DERT: Less is More_Focus Attention for Efficient DETR, Zheng et al., arxiv 2023. [paper][code]
U2-Net_Going Deeper with Nested U-Structure for Salient Object Detection, Qin et al., arxiv 2020. [paper][code]
YOLO: You Only Look Once: Unified, Real-Time Object Detection Redmon et al., arxiv 2015. [paper]
YOLOX: Exceeding YOLO Series in 2021, Ge et al., arxiv 2021. [paper][code]
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism, Wang et al., arxiv 2023. [paper][code]
YOLO-World: Real-Time Open-Vocabulary Object Detection, Cheng et al., arxiv 2024. [paper][code]
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, Wang et al., arxiv 2024. [paper][code]
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, Jiang et al., arxiv 2024. [paper][code]
YOLOv10: Real-Time End-to-End Object Detection, Wang et al., arxiv 2024. [paper][yolov10]
[detectron2][yolov5][mmdetection][detrex]

8. Semantic Segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al., MICCAI 2015. [paper][code]
Segment Anything, Kirillov et al., ICCV 2023. [paper][code]
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, Xiong et al., CVPR 2024. [paper][code]
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks, Ren et al., arxiv 2024. [paper][code]
[mmsegmentation][mmdeploy][Painter]

9. Video

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training, Tong et al., NeurIPS 2022 Spotlight. [paper][code]
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation, Wang et al., arxiv 2024. [paper]
[V-JEPA][I-JEPA]
VideoMamba: State Space Model for Efficient Video Understanding, Li et al., arxiv 2024. [paper][code]
VideoChat: Chat-Centric Video Understanding, Li et al., CVPR 2024 Highlight. [paper][code]

10. Survey for CV

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy, Vishniakov et al., arxiv 2023. [paper][code]
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey, Xin et al., arxiv 2024. [paper][code]

Multimodal

1. Audio

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford et al., arxiv 2022. [paper][code][whisper.cpp][faster-whisper][WhisperFusion]
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio, Bain et al., arxiv 2023. [paper][code]
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling，Gandhi et al., arxiv 2023. [paper][code]
Speculative Decoding for 2x Faster Whisper Inference, Sanchit Gandhi, HuggingFace Blog 2023. [blog][paper]
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, Wang et al., arxiv 2023. [paper][code]
VALL-E-X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling, Zhang et al., arxiv 2023. [paper][code]
Seamless: Multilingual Expressive and Streaming Speech Translation, Seamless Communication et al., arxiv 2023. [paper][code][audiocraft]
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation, Seamless Communication et al., arxiv 2023. [paper][code]
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, Li et al., NeurIPS 2023. [paper][code]
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit, Zhang et al., arxiv 2023. [paper][code]
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech, Kim et al., ICML 2021. [paper][code][Bert-VITS2][so-vits-svc-fork][GPT-SoVITS][VITS-fast-fine-tuning]
OpenVoice: Versatile Instant Voice Cloning, Qin et al., arxiv 2023. [paper][code][MockingBird][clone-voice]
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models, Ju et al., arxiv 2024. [paper]
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild, Peng et al., arxiv 2024. [paper][code]
WavLLM: Towards Robust and Adaptive Speech Large Language Model, Hu et al., arxiv 2024. [paper][code]
Github Repositories
[coqui-ai/TTS][suno-ai/bark][ChatTTS][WhisperSpeech][MeloTTS][parler-tts][fish-speech]
[stable-audio-tools][pyannote-audio]
https://github.com/netease-youdao/EmotiVoice
[FunASR][FunClip][TeleSpeech-ASR]
[SadTalker][Wav2Lip[video-retalking][SadTalker-Video-Lip-Sync][AniPortrait][V-Express]
[Retrieval-based-Voice-Conversion-WebUI]
[speech-trident]

2. Blip

ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation, Li et al., NeurIPS 2021. [paper][code]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, Li et al., arxiv 2022. [paper][code]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, Li et al., arxiv 2023. [paper][code]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning, Dai et al., arxiv 2023. [paper][code]
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning, Panagopoulou et al., arxiv 2023. [paper][code]
LAVIS: A Library for Language-Vision Intelligence, Li et al., arxiv 2022. [paper][code]
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts, Bao et al., NeurIPS 2022. [paper][code]
BEiT: BERT Pre-Training of Image Transformers, Bao et al., ICLR 2022 Oral presentation. [paper][code]
BeiT-V3: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Wang et al., CVPR 2023. [paper][code]

3. Clip

CLIP: Learning Transferable Visual Models From Natural Language Supervision, Radford et al., ICML 2021. [paper][code][clip-as-service][open_clip]
DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code]
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention, Geng et al., ICLR 2023. [paper][code]
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese, Yang et al., arxiv 2022. [paper][code]
MetaCLIP: Demystifying CLIP Data, Xu et al., ICLR 2024 Spotlight. [paper][code]
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want, Sun et al., arxiv 2023. [paper][code][Bootstrap3D]
MMVP: Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code]
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training, Vasu et al., CVPR 20224. [paper][code]
Long-CLIP: Unlocking the Long-Text Capability of CLIP, Zhang et al., arxiv 2024. [paper][code]

4. Diffusion Model

Tutorial on Diffusion Models for Imaging and Vision, Stanley H. Chan, arxiv 2024. [paper]
Denoising Diffusion Probabilistic Models, Ho et al., NeurIPS 2020. [paper][code][Pytorch Implementation][RDDM]
Improved Denoising Diffusion Probabilistic Models, Nichol and Dhariwal, ICML 2021. [paper][code]
Diffusion Models Beat GANs on Image Synthesis, Dhariwal and Nichol, NeurIPS 2021. [paper][code]
Classifier-Free Diffusion Guidance, Ho and Salimans, NeurIPS 2021. [paper][code]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Nichol et al., arxiv 2021. [paper][code]
DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code][dalle-mini]
Stable-Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models, Rombach et al., CVPR 2022. [paper][code][CompVis/stable-diffusion][Stability-AI/stablediffusion][ml-stable-diffusion]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et al., arxiv 2023. [paper][code][SDXL-Lightning]
Introducing Stable Cascade, Stability AI, 2024. [link][code][model]
SDXL-Turbo: Adversarial Diffusion Distillation, Sauer et al., arxiv 2023. [paper][code]
LCM: Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, Luo et al., arxiv 2023. [paper][code][Hyper-SD]
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module, Luo et al., arxiv 2023. [paper][code]
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, Esser et al., arxiv 2024. [paper][mmdit]
SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation, Sauer et al., arxiv 2024. [paper]
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation, Kodaira et al., arxiv 2023. [paper][code]
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models, Marjit et al., arxiv 2024. [paper][code]
Video Diffusion Models, Ho et al., arxiv 2022. [paper][code]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, Blattmann et al., arxiv 2023. [paper][code]
Consistency Models, Song et al., arxiv 2023. [paper][code][Consistency Decoder]
A Survey on Video Diffusion Models, Xing et al., srxiv 2023. [paper][code]
Diffusion Models: A Comprehensive Survey of Methods and Applications, Yang et al., arxiv 2023. [paper][code]
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation, Yu et al., arxiv 2023. [paper]
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models, Avrahami et al., arxiv 2023. [paper][code]
U-ViT: All are Worth Words: A ViT Backbone for Diffusion Models, Bao et al., CVPR 2023. [paper][code]
UniDiffuser: One Transformer Fits All Distributions in Multi-Modal Diffusion, Bao et al., arxiv 2023. [paper][code]
l-DAE: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, Chen et al., arxiv 2024. [paper]
DiT: Scalable Diffusion Models with Transformers, Peebles et al., ICCV 2023 Oral. [paper][code][OpenDiT][MDT]
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers, Ma et al., arxiv 2024. [paper][code]
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis, Ren et al., arxiv 2024. [paper][model]
Github Repositories
[Awesome-Diffusion-Models][Awesome-Video-Diffusion]
[stable-diffusion-webui][stable-diffusion-webui-colab][sd-webui-controlnet][stable-diffusion-webui-forge][automatic]
[Fooocus][Omost]
[ComfyUI][streamlit][gradio][ComfyUI-Workflows-ZHO]
[diffusers]

5. Multimodal LLM

LLaVA: Visual Instruction Tuning, Liu et al., NeurIPS 2023 Oral. [paper][code][vip-llava][LLaVA-pp][TinyLLaVA_Factory][LLaVA-RLHF]
LLaVA-1.5: Improved Baselines with Visual Instruction Tuning, Liu et al., arxiv 2023. [paper][code]
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day, Li et al., arxiv 2023. [paper][code]
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection, Lin et al., arxiv 2023. [paper][code]
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, Lin et al., arxiv 2024. [paper][code]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models, Zhu et al., arxiv 2023. [paper][code]
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning, Chen et al., arxiv 2023. [paper][code]
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens, Ataallah et al., arxiv 2024. [paper][code]
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens, Zheng et al., arxiv 2023. [paper][code]
Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS 2022. [paper][open-flamingo][flamingo-pytorch]
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding, Zhang et al., EMNLP 2023. [paper][code]
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs, Zhao et al., arxiv 2023. [paper][code][AnyGPT]
Emu: Generative Pretraining in Multimodality, Sun et al., ICLR 2024. [paper][code]
CogVLM: Visual Expert for Pretrained Language Models, Wang et al., arxiv 2023. [paper][code][CogVLM2][VisualGLM-6B][CogCoM]
DreamLLM: Synergistic Multimodal Comprehension and Creation, Dong et al., ICLR 2024 Spotlight. [paper][code]
NExT-GPT: Any-to-Any Multimodal LLM, Wu et al., arxiv 2023. [paper][code]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, Wu et al., arxiv 2023. [paper][code]
SoM: Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V, Yang et al., arxiv 2023. [paper][code]
Ferret: Refer and Ground Anything Anywhere at Any Granularity, You et al., arxiv 2023. [paper][code][Ferret-UI]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond, Bai et al., arxiv 2023. [paper][code]
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition, Zhang et al., arxiv 2023. [paper][code]
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks, Chen et al., CVPR 2024. [paper][code][InternVideo][InternVid]
DeepSeek-VL: Towards Real-World Vision-Language Understanding, Lu et al., arxiv 2024. [paper][code]
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions, Chen et al., arxiv 2023. [paper][code][ShareGPT4Video]
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones, Yuan et al., arxiv 2023. [paper][code]
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models, Li et al., CVPR 2024. [paper][code]
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models, Wei et al., arxiv 2023. [paper][code]
Vary-toy: Small Language Model Meets with Reinforced Vision Vocabulary, Wei et al., arxiv 2024. [paper][code]
LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]
Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper]
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts, Li et al., arxiv 2024. [paper][code]
RL4VLM: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning, Zhai et al., arxiv 2024. [paper][code][RLHF-V][RLAIF-V]
[MiniCPM-V][moondream][MobileVLM][OmniFusion][Bunny]

6. Text2Image

DALL-E: Zero-Shot Text-to-Image Generation, Ramesh et al., arxiv 2021. [paper][code]
DALL-E3: Improving Image Generation with Better Captions, Betker et al., OpenAI 2023. [paper][code][blog]
ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models, Zhang et al., ICCV 2023 Marr Prize. [paper][code]
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, Mou et al., AAAI 2024. [paper][code]
AnyText: Multilingual Visual Text Generation And Editing, Tuo et al., arxiv 2023. [paper][code]
RPG: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs, Yang et al., ICML 2024. [paper][code]
LAION-5B: An open large-scale dataset for training next generation image-text models, Schuhmann et al., NeurIPS 2022. [paper][code][blog]
DeepFloyd IF: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., arxiv 2022. [paper][code]
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., NeurIPS 2022. [paper][unofficial code]
Instruct-Imagen: Image Generation with Multi-modal Instruction, Hu et al., arxiv 2024. [paper]
TextDiffuser: Diffusion Models as Text Painters, Chen et al., arxiv 2023. [paper][code]
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering, Chen et al., arxiv 2023. [paper][code]
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, Chen et al., arxiv 2023. [paper][code]
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models, Chen et al., arxiv 2024. [paper][code]
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation, Chen et al., arxiv 2024. [paper][code]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, Ye et al., arxiv 2023. [paper][code][ID-Animator]
Controllable Generation with Text-to-Image Diffusion Models: A Survey, Cao et al., arxiv 2024. [paper][code]
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation, Zhou et al., arxiv 2024. [paper][code]
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding, Li et al., arxiv 2024. [paper][code]

7. Text2Video

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Hu et al., arxiv 2023. [paper][code][Open-AnimateAnyone][Moore-AnimateAnyone][AnimateAnyone]
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, Tian et al., arxiv 2024. [paper][code]
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation, Wei wt al., arxiv 2024. [paper][code]
DreaMoving: A Human Video Generation Framework based on Diffusion Models, Feng et al., arxiv 2023. [paper][code]
MagicAnimate:Temporally Consistent Human Image Animation using Diffusion Model, Xu et al., arxiv 2023. [paper][code][champ]
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors, Xing et al., arxiv 2023. [paper][code]
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis, Liang et al., arxiv 2023. [paper][code]
[Awesome-Video-Diffusion]
Video Diffusion Models, Ho et al., arxiv 2022. [paper][video-diffusion-pytorch]
Make-A-Video: Text-to-Video Generation without Text-Video Data, Singer et al., arxiv 2022. [paper][make-a-video-pytorch]
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, Wu et al., ICCV 2023. [paper][code]
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators, Khachatryan et al., ICCV 2023 Oral. [paper][code][StreamingT2V]
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers, Hong et al., ICLR 2023. [paper][code]
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos, Ma et al., AAAI 2024. [paper][code][Follow-Your-Pose v2][Follow-Your-Emoji]
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts, Ma et al., arxiv 2024. [paper][code]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, Guo et al., arxiv 2023. [paper][code][AnimateDiff-Lightning]
StableVideo: Text-driven Consistency-aware Diffusion Video Editing, Chai et al., ICCV 2023. [paper][code]
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models, Zhang et al., arxiv 2023. [paper][code]
TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos, Wang et al., arxiv 2023. [paper][code]
Lumiere: A Space-Time Diffusion Model for Video Generation, Bar-Tal et al., arxiv 2024. [paper][lumiere-pytorch]
Sora: Creating video from text, OpenAI, 2024. [blog][Open-Sora][Open-Sora-Plan][minisora][SoraWebui][MuseV][PhysDreamer][easyanimate]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Liu et al., arxiv 2024. [paper][code]
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework, Yuan et al., arxiv 2024. [paper][code]
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, Dehghani et al., NeurIPS 2024. [paper][unofficial code]
VideoPoet: A Large Language Model for Zero-Shot Video Generation, Kondratyuk et al., arxiv 2023. [paper]
Latte: Latent Diffusion Transformer for Video Generation, Ma et al., arxiv 2024. [paper][code][LaVIT]
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis, Menapace et al., arxiv 2024. [paper][articulated-animation]
[MoneyPrinterTurbo][videos]

8. Survey for Multimodal

A Survey on Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
Multimodal Foundation Models: From Specialists to General-Purpose Assistants, Li et al., arxiv 2023. [paper][cvinw_readings]
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities, Lu et al., arxiv 2024. [paper][Leaderboards]
Efficient Multimodal Large Language Models: A Survey, Jin et al., arxiv 2024. [paper][code]
An Introduction to Vision-Language Modeling, Bordes et al., arxiv 2024. [paper]

9. Other

Fuyu-8B: A Multimodal Architecture for AI Agents Bavishi et al., Adept blog 2023. [blog][model]
Otter: A Multi-Modal Model with In-Context Instruction Tuning, Li et al., arxiv 2023. [paper][code]
OtterHD: A High-Resolution Multi-modality Model, Li et al., arxiv 2023. [paper][code][model]
CM3leon: Scaling Autoregressive Multi-Modal Models_Pretraining and Instruction Tuning, Yu et al., arxiv 2023. [paper][Unofficial Implementation]
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer, Tian et al., arxiv 2024. [paper][code]
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations, Qi et al., arxiv 2024. [paper][code]
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models, Gao et al., arxiv 2024. [paper][code][Lumina-T2X]
LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]

Reinforcement Learning

1.Basic for RL

Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy, 2016. [blog][reinforcement-learning-an-introduction][easy-rl][deep-rl-course]
DQN: Playing Atari with Deep Reinforcement Learning, Mnih et al., arxiv 2013. [paper][code]
DQNNaturePaper: Human-level control through deep reinforcement learning, Mnih et al., Nature 2015. [paper][DQN-tensorflow][DQN_pytorch]
DDQN: Deep Reinforcement Learning with Double Q-learning, Hasselt et al., AAAI 2016. [paper][RL-Adventure][deep-q-learning][Deep-RL-Keras]
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hesssel et al., AAAI 2018. [paper][Rainbow]
DDPG: Continuous control with deep reinforcement learning, Lillicrap et al., ICLR 2016. [paper][pytorch-ddpg]
PPO: Proximal Policy Optimization Algorithms, Schulman et al., arxiv 2017. [paper][code][trl ppo_trainer][PPO-PyTorch][implementation-matters][PPOxFamily]
Diffusion Models for Reinforcement Learning: A Survey, Zhu et al., arxiv 2023. [paper][code][diffusion_policy]
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, Matthias Lehmann, arxiv 2024. [paper][code]
[tianshou][rlkit][pytorch-a2c-ppo-acktr-gail]

2. LLM for decision making

Decision Transformer_Reinforcement Learning via Sequence Modeling, Chen et al., NeurIPS 2021. [paper][code]
Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner et al., NeurIPS 2021. [paper][code]
Guiding Pretraining in Reinforcement Learning with Large Language Models, Du et al., ICML 2023. [paper][code]
Introspective Tips: Large Language Model for In-Context Decision Making, Chen et al., arxiv 2023. [paper]
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Chebotar et al., CoRL 2023. [paper][Unofficial Implementation]
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods, Cao et al., arxiv 2024. [paper]

GNN

[GNNPapers][dgl]
A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., Distill 2021. [paper]
CS224W: Machine Learning with Graphs, Stanford. [link]
GCN: Semi-Supervised Classification with Graph Convolutional Networks, Kipf and Welling, ICLR 2017. [paper][code][pygcn]
GAE: Variational Graph Auto-Encoders, Kipf and Welling, arxiv 2016. [paper][code][gae-pytorch]
GAT: Graph Attention Networks, Veličković et al., ICLR 2018. [paper][code][pyGAT][pytorch-GAT]
GIN: How Powerful are Graph Neural Networks?, Xu et al., ICLR 2019. [paper][code]
Graphormer: Do Transformers Really Perform Bad for Graph Representation, Ying et al., NeurIPS 2021. [paper][code]
GraphGPT: Graph Instruction Tuning for Large Language Models, Tang et al., SIGIR 2024. [paper][code]
OpenGraph: Towards Open Graph Foundation Models, Xia et al., arxiv 2024. [paper][code]
[pytorch_geometric]

Survey for GNN

Transformer Architecture

Attention is All you Need, Vaswani et al., NIPS 2017. [paper][code][transformer-debugger][The Illustrated Transformer][The Random Transformer][The Annotated Transformer][Transformers-Tutorials][x-transformers]
RoPE: RoFormer: Enhanced Transformer with Rotary Position Embedding, Su et al., arxiv 2021. [paper][code][rotary-embedding-torch][rerope][blog][longformer]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints, Ainslie et al., arxiv 2023. [paper][unofficial code]
RWKV: Reinventing RNNs for the Transformer Era, Peng et al., EMNLP 2023. [paper][code][ChatRWKV][rwkv.cpp]
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, Peng et al., arxiv 2024. [paper][code]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces, Gu and Dao, arxiv 2023. [paper][code][mamba-minimal][Awesome-Mamba-Papers]
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models, De et al., arxiv 2024. [paper][recurrentgemma]
Jamba: A Hybrid Transformer-Mamba Language Model, Lieber et al., arxiv 2024. [paper][model]
Neural Network Diffusion, Wang et al., arxiv 2024. [paper][code][GPD]
KAN: Kolmogorov-Arnold Networks, Liu et al., arxiv 2024. [paper][code][efficient-kan][kan-gpt][Convolutional-KANs]
xLSTM: Extended Long Short-Term Memory, Beck et al., arxiv 2024. [paper][code][vision-lstm][PyxLSTM][xlstm-cuda][Attention as an RNN]

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
LICENSE		LICENSE
README.md		README.md

License

songqiang321/Awesome-AI-Papers

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Awesome-AI-Papers

Table of Content

NLP

1. Word2Vec

2. Seq2Seq

3. Pretraining

3.1 Large Language Model

3.2 LLM Application

3.2.1 AI Agent

3.2.2 Academic

3.2.3 Code

3.2.4 Financial Application

3.2.5 Information Retrieval

3.2.6 Math

3.2.7 Medicine and Law

3.2.8 Recommend System

3.2.9 Tool Learning

3.3 LLM Technique

3.3.1 Alignment

3.3.2 Context Length

3.3.3 Corpus

3.3.4 Evaluation

3.3.5 Hallucination

3.3.6 Inference

3.3.7 MoE

3.3.8 PEFT (Parameter-efficient Fine-tuning)

3.3.9 Prompt Learning

3.3.10 RAG (Retrieval Augmented Generation)

Text Embedding

3.3.11 Reasoning and Planning

Survey

3.4 LLM Theory

3.5 Chinese Model

CV

1. Basic for CV

2. Contrastive Learning

3. CV Application

4. Foundation Model

5. Generative Model (GAN and VAE)

6. Image Editing

7. Object Detection

8. Semantic Segmentation

9. Video

10. Survey for CV

Multimodal

1. Audio

2. Blip

3. Clip

4. Diffusion Model

5. Multimodal LLM

6. Text2Image

7. Text2Video

8. Survey for Multimodal

9. Other

Reinforcement Learning

1.Basic for RL

2. LLM for decision making

GNN

Survey for GNN

Transformer Architecture

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages