Skip to content

This repository is used to collect papers and code in the field of AI.

License

Notifications You must be signed in to change notification settings

songqiang321/Awesome-AI-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Awesome-AI-Papers

This repository is used to collect papers and code in the field of AI. The contents contain the following parts:

Table of Content

  ├─ NLP/  
  │  ├─ Word2Vec/  
  │  ├─ Seq2Seq/           
  │  └─ Pretraining/  
  │    ├─ Large Language Model/          
  │    ├─ LLM Application/ 
  │      ├─ AI Agent/          
  │      ├─ Academic/          
  │      ├─ Code/       
  │      ├─ Financial Application/
  │      ├─ Information Retrieval/  
  │      ├─ Math/     
  │      ├─ Medicine and Law/   
  │      ├─ Recommend System/      
  │      └─ Tool Learning/             
  │    ├─ LLM Technique/ 
  │      ├─ Alignment/          
  │      ├─ Context Length/          
  │      ├─ Corpus/       
  │      ├─ Evaluation/
  │      ├─ Hallucination/  
  │      ├─ Inference/     
  │      ├─ MoE/   
  │      ├─ PEFT/     
  │      ├─ Prompt Learning/   
  │      ├─ RAG/       
  │      └─ Reasoning and Planning/       
  │    ├─ LLM Theory/       
  │    └─ Chinese Model/             
  ├─ CV/  
  │  ├─ CV Application/          
  │  ├─ Contrastive Learning/         
  │  ├─ Foundation Model/ 
  │  ├─ Generative Model (GAN and VAE)/          
  │  ├─ Image Editing/          
  │  ├─ Object Detection/          
  │  ├─ Semantic Segmentation/            
  │  └─ Video/          
  ├─ Multimodal/       
  │  ├─ Audio/          
  │  ├─ BLIP/         
  │  ├─ CLIP/        
  │  ├─ Diffusion Model/   
  │  ├─ Multimodal LLM/          
  │  ├─ Text2Image/          
  │  ├─ Text2Video/            
  │  └─ Survey/           
  │─ Reinforcement Learning/ 
  │─ GNN/ 
  └─ Transformer Architecture/          

NLP

1. Word2Vec

  • Efficient Estimation of Word Representations in Vector Space, Mikolov et al., arxiv 2013. [paper]
  • Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al., NIPS 2013. [paper]
  • Distributed representations of sentences and documents, Le and Mikolov, ICML 2014. [paper]
  • Word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method, Goldberg and Levy, arxiv 2014. [paper]
  • word2vec Parameter Learning Explained, Rong, arxiv 2014. [paper]
  • Glove: Global vectors for word representation.Pennington et al., EMNLP 2014. [paper][code]
  • fastText: Bag of Tricks for Efficient Text Classification, Joulin et al., arxiv 2016. [paper][code]
  • ELMo: Deep Contextualized Word Representations, Peters et al., arxiv. 2018. [paper]
  • BPE: Neural Machine Translation of Rare Words with Subword Units, Sennrich et al., ACL 2016. [paper][code]
  • Byte-Level BPE: Neural Machine Translation with Byte-Level Subwords, Wang et al., arxiv 2019. [paper][code]

2. Seq2Seq

  • Generating Sequences With Recurrent Neural Networks, Graves, arxiv 2013. [paper]
  • Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeruIPS 2014. [paper]
  • Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., ICLR 2015. [paper][code]
  • On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Cho et al., arxiv 2014. [paper]
  • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Cho et al., arxiv 2014. [paper]
  • [fairseq][pytorch-seq2seq]

3. Pretraining

3.1 Large Language Model

  • A Survey of Large Language Models, Zhao etal., arxiv 2023. [paper][code][LLMBox][LLMBook-zh][LLMsPracticalGuide]
  • Efficient Large Language Models: A Survey, Wan et al., arxiv 2023. [paper][code]
  • Challenges and Applications of Large Language Models, Kaddour et al., arxiv 2023. [paper]
  • A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, Zhou et al., arxiv 2023. [paper]
  • From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*, Mclntosh et al., arxiv 2023. [paper][AGI-survey]
  • A Survey of Resource-efficient LLM and Multimodal Foundation Models, Xu et al., arxiv 2024. [paper][code]
  • Large Language Models: A Survey, Minaee et al., arxiv 2024. [paper]
  • Anthropic: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
  • Anthropic: Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
  • Anthropic: Model Card and Evaluations for Claude Models, Anthropic, 2023. [paper]
  • Anthropic: The Claude 3 Model Family: Opus, Sonnet, Haiku, Anthropic, 2024. [paper]
  • BLOOM_A 176B-Parameter Open-Access Multilingual Language Model, BigScience Workshop, arxiv 2022. [paper][code][model]
  • OPT: Open Pre-trained Transformer Language Models, Zhang et al., arxiv 2022. [paper][code]
  • Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., arxiv 2022. [paper]
  • Gopher: Scaling Language Models: Methods, Analysis & Insights from Training Gopher, Rae et al., arxiv 2021. [paper]
  • GPT-NeoX-20B: An Open-Source Autoregressive Language Model, Black et al., arxiv 2022. [paper][code]
  • Gemini: A Family of Highly Capable Multimodal Models, Gemini Team, Google, arxiv 2023. [paper][Gemini 1.0][Gemini 1.5][Unofficial Implementation][MiniGemini]
  • Gemma: Open Models Based on Gemini Research and Technology, Google DeepMind, 2024. [paper][code][google-deepmind/gemma][gemma.cpp][model][paligemma]
  • GPT-4 Technical Report, OpenAI, arxiv 2023. [paper]
  • GPT-4V(ision) System Card, OpenAI, OpenAI blog 2023. [paper]
  • Sparks of Artificial General Intelligence_Early experiments with GPT-4, Bubeck et al., arxiv 2023. [paper]
  • The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision), Yang et al., arxiv 2023. [paper][guidance]
  • LaMDA: Language Models for Dialog Applications, Thoppilan et al., arxiv 2022. [paper][LaMDA-rlhf-pytorch]
  • LLaMA: Open and Efficient Foundation Language Models, Touvron et al., arxiv 2023. [paper][code][llama.cpp][ollama][llamafile]
  • Llama 2: Open Foundation and Fine-Tuned Chat Models, Touvron et al., arxiv 2023. [paper][code][llama-recipes][llama2.c][lit-llama][litgpt]
  • [llama3][llama3-from-scratch]
  • TinyLlama: An Open-Source Small Language Model, Zhang et al., arxiv 2024. [paper][code][LiteLlama][MobiLlama]
  • Stanford Alpaca: An Instruction-following LLaMA Model, Taori et al., Stanford blog 2023. [paper][code][Alpaca-Lora]
  • Mistral 7B, Jiang et al., arxiv 2023. [paper][code][model][mistral-finetune]
  • OLMo: Accelerating the Science of Language Models, Groeneveld et al., arxiv 2024. [paper][code][Dolma Dataset]
  • Minerva: Solving Quantitative Reasoning Problems with Language Models, Lewkowycz et al., arxiv 2022. [paper]
  • PaLM: Scaling Language Modeling with Pathways, Chowdhery et al., arxiv 2022. [paper][PaLM-pytorch][PaLM-rlhf-pytorch][PaLM]
  • PaLM 2 Technical Report, Anil et al., arxiv 2023. [paper]
  • PaLM-E: An Embodied Multimodal Language Model, Driess et al., arxiv 2023. [paper][code]
  • T5: Exploring the limits of transfer learning with a unified text-to-text transformer, Raffel et al., Journal of Machine Learning Research 2023. [paper][code][t5-pytorch]
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, Lewis et al., ACL 2020. [paper][code]
  • FLAN: Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper][code]
  • Scaling Flan: Scaling Instruction-Finetuned Language Models, Chung et al., arxiv 2022. [paper][model]
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Dai et al., ACL 2019. [paper][code]
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding, Yang et al., NeurIPS 2019. [paper][code]
  • WebGPT: Browser-assisted question-answering with human feedback, Nakano et al., arxiv 2021. [paper][MS-MARCO-Web-Search]
  • Open Release of Grok-1, xAI, 2024. [blog][code][model][modelscope][hpcai-tech/grok-1][dbrx][Command R+][snowflake-arctic]

3.2 LLM Application

  • A Watermark for Large Language Models, Kirchenbauer et al., arxiv 2023. [paper][code][markllm]

  • SeqXGPT: Sentence-Level AI-Generated Text Detection, Wang et al., EMNLP 2023. [paper][code][llm-detect-ai][detect-gpt][fast-detect-gpt]

  • AlpaGasus: Training A Better Alpaca with Fewer Data, Chen et al., arxiv 2023. [paper][code]

  • AutoMix: Automatically Mixing Language Models, Madaan et al., arxiv 2023. [paper][code]

  • ChipNeMo: Domain-Adapted LLMs for Chip Design, Liu et al., arxiv 2023. [paper]

  • GAIA: A Benchmark for General AI Assistants, Mialon et al., ICLR 2024. [paper][code]

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code]

  • MemGPT: Towards LLMs as Operating Systems, Packer et al., arxiv 2023. [paper][code]

  • UFO: A UI-Focused Agent for Windows OS Interaction, Zhang et al., arxiv 2024. [paper][code]

  • OS-Copilot: Towards Generalist Computer Agents with Self-Improvement, Wu et al., ICLR 2024. [paper][code]

  • AIOS: LLM Agent Operating System, Mei et al., arxiv 2024. [paper][code]

  • DB-GPT: Empowering Database Interactions with Private Large Language Models, Xue et al., arxiv 2023. [paper][code][DocsGPT][privateGPT][localGPT]

  • OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, Wang et al., ICLR 2024. [paper][code]

  • OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement, Zheng et al., arxiv 2024. [paper][code]

  • Orca: Progressive Learning from Complex Explanation Traces of GPT-4, Mukherjee et al., arxiv 2023. [paper]

  • PDFTriage: Question Answering over Long, Structured Documents, Saad-Falcon et al., arxiv 2023. [paper][[code]]

  • Prompt2Model: Generating Deployable Models from Natural Language Instructions, Viswanathan et al., arxiv 2023. [paper][code]

  • Shepherd: A Critic for Language Model Generation, Wang et al., arxiv 2023. [paper][code]

  • Alpaca: A Strong, Replicable Instruction-Following Model, Taori et al., Stanford Blog 2023. [paper][code]

  • Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*, Chiang et al., 2023. [blog]

  • WizardLM: Empowering Large Language Models to Follow Complex Instructions, Xu et al., ICLR 2024. [paper][code]

  • WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences, Liu et al., KDD 2023. [paper][code][AutoWebGLM][AutoCrawler][gpt-crawler][webllama][gpt-researcher][skyvern][Scrapegraph-ai]

  • LLM4Decompile: Decompiling Binary Code with Large Language Models, Tan et al., arxiv 2024. [paper] [code]

  • [ray][dask][TaskingAI][gpt4all][ollama][llama.cpp][dify][bisheng][phidata][guidance]

  • [awesome-llm-apps]

3.2.1 AI Agent
  • LLM Powered Autonomous Agents, Lilian Weng, 2023. [blog][LLMAgentPapers][LLM-Agents-Papers][awesome-language-agents][Awesome-Papers-Autonomous-Agent]

  • A Survey on Large Language Model based Autonomous Agents, Wang et al., [paper][code]

  • The Rise and Potential of Large Language Model Based Agents: A Survey, Xi et al., arxiv 2023. [paper][code]

  • Agent AI: Surveying the Horizons of Multimodal Interaction, Durante et al., arxiv 2024. [paper]

  • Position Paper: Agent AI Towards a Holistic Intelligence, Huang et al., arxiv 2024. [paper]

  • AgentBench: Evaluating LLMs as Agents, Liu et al., ICLR 2024. [paper][code][OSWorld]

  • Agents: An Open-source Framework for Autonomous Language Agents, Zhou et al., arxiv 2023. [paper][code]

  • AutoAgents: A Framework for Automatic Agent Generation, Chen et al., arxiv 2023. [paper][code]

  • AgentTuning: Enabling Generalized Agent Abilities for LLMs, Zeng et al., arxiv 2023. [paper][code]

  • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors, Chen et al., ICLR 2024. [paper][code]

  • AppAgent: Multimodal Agents as Smartphone Users, Zhang et al., arxiv 2023. [paper][code]

  • Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception, Wang et al., arxiv 2024. [paper][code]

  • Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, Li et al., arxiv 2024. [paper][code]

  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al., arxiv 2023. [paper][code]

  • CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Li et al., NeurIPS 2023. [paper][code]

  • ChatDev: Communicative Agents for Software Development, Qian et al., ACL 2024. [paper][code][gpt-pilot]

  • MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework, Hong et al., ICLR 2024 Oral. [paper][code]

  • RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation, Luo et al., arxiv 2024. [paper][code]

  • Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arxiv 2023. [paper][code][GPTeam]

  • CogAgent: A Visual Language Model for GUI Agents, Hong et al., CVPR 2024. [paper][code]

  • OpenAgents: An Open Platform for Language Agents in the Wild, Xie et al., arxiv 2023. [paper][code]

  • TaskWeaver: A Code-First Agent Framework, Qiao et al., arxiv 2023. [paper][code]

  • MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, Fan et al., NeurIPS 2022 Outstanding Paper. [paper][code]

  • Voyager: An Open-Ended Embodied Agent with Large Language Models, Wang et al., arxiv 2023. [paper][code]

  • Eureka: Human-Level Reward Design via Coding Large Language Models, Ma et al., ICLR 2024. [paper][code][DrEureka]

  • Mind2Web: Towards a Generalist Agent for the Web, Deng et al., NeurIPS 2023. [paper][code][AutoWebGLM]

  • SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded, Zheng et al., arxiv 2024. [paper][code]

  • Foundation Models in Robotics: Applications, Challenges, and the Future, Firoozi et al., arxiv 2023. [paper][code]

  • RT-1: Robotics Transformer for Real-World Control at Scale, Brohan et al., arxiv 2022. [paper][code]

  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, Brohan et al., arxiv 2023. [paper][Unofficial Implementation][RT-H: Action Hierarchies Using Language]

  • Open X-Embodiment: Robotic Learning Datasets and RT-X Models, Open X-Embodiment Collaboration, arxiv 2023. [paper][code]

  • Shaping the future of advanced robotics, Google DeepMind 2024. [blog]

  • RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, Wang et al., ICML 2024. [paper][code]

  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy, Liu et al., arxiv 2024. [paper]

  • Genie: Generative Interactive Environments, Bruce et al., arxiv 2024. [paper]

  • Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, Fu et al., arxiv 2024. [paper][code][Hardware Code][Learning Code][UMI]

  • Octo: An Open-Source Generalist Robot Policy, Ghosh et al., arxiv 2024. [paper][code]

  • [LeRobot][DORA][awesome-ai-agents][IsaacLab]

  • [AutoGPT][GPT-Engineer][AgentGPT]

  • [BabyAGI][SuperAGI][OpenAGI]

  • [open-interpreter][Homepage][rawdog][OpenCodeInterpreter]

  • XAgent: An Autonomous Agent for Complex Task Solving, [blog][code]

  • [crewAI][phidata][gpt-computer-assistant]

3.2.2 Academic
  • Galactica: A Large Language Model for Science, Taylor et al., arxiv 2022. [paper][code]
  • K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization, Deng et al., arxiv 2023. [paper][code][pdf_parser]
  • GeoGalactica: A Scientific Large Language Model in Geoscience, Lin et al., arxiv 2024. [paper][code][sciparser]
  • Scientific Large Language Models: A Survey on Biological & Chemical Domains, Zhang et al., arxiv 2024. [paper][code]
  • SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning, Zhang et al., arxiv 2024. [paper][code]
  • ChemLLM: A Chemical Large Language Model, Zhang et al., arxiv 2024. [paper][model]
  • LangCell: Language-Cell Pre-training for Cell Identity Understanding, Zhao et al., ICML 2024. [paper][code][scFoundation]
  • [Awesome-Scientific-Language-Models][gpt_academic][ChatPaper]
3.2.3 Code
  • Neural code generation, CMU 2024 Spring. [link]

  • Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, Zhang et al., arxiv 2023. [paper][Awesome-Code-LLM][MFTCoder]

  • Source Code Data Augmentation for Deep Learning: A Survey, Zhuo et al., arxiv 2023. [paper][code]

  • Codex: Evaluating Large Language Models Trained on Code, Chen et al., arxiv 2021. [paper][human-eval]

  • Code Llama: Open Foundation Models for Code, Rozière et al., arxiv 2023. [paper][code][model]

  • CodeGemma: Open Code Models Based on Gemma, [blog][report]

  • AlphaCode: Competition-Level Code Generation with AlphaCode, Li et al., arxiv 2022. [paper][dataset][AlphaCode2_Tech_Report]

  • CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X, Zheng et al., KDD 2023. [paper][code][CodeGeeX2]

  • CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, Nijkamp et al., ICLR 2022. [paper][code]

  • CodeGen2: Lessons for Training LLMs on Programming and Natural Languages, Nijkamp et al., ICLR 2023. [paper][code]

  • CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, Le et al., arxiv 2023. [paper][code]

  • StarCoder: may the source be with you, Li et al., arxiv 2023. [paper][code][bigcode-project][model]

  • StarCoder 2 and The Stack v2: The Next Generation, Lozhkov et al., 2024. [paper][code][starcoder.cpp]

  • WizardCoder: Empowering Code Large Language Models with Evol-Instruct, Luo et al., ICLR 2024. [paper][code]

  • Magicoder: Source Code Is All You Need, Wei et al., arxiv 2023. [paper][code]

  • Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Ridnik et al., arxiv 2024. [paper][code][pr-agent][cover-agent]

  • DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence, Guo et al., arxiv 2024. [paper][code]

  • If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Yang et al., arxiv 2024. [paper]

  • Design2Code: How Far Are We From Automating Front-End Engineering?, Si et al., arxiv 2024. [paper][code]

  • AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct, Lei et al., arxiv 2024. [paper][code]

  • [CodeQwen1.5][aiXcoder-7B]

  • [OpenDevin][swe-bench-technical-report][devika][SWE-agent][auto-code-rover][developer]

  • [screenshot-to-code][vanna]

3.2.4 Financial Application
  • DocLLM: A layout-aware generative language model for multimodal document understanding, Wang et al., arxiv 2024. [paper]
  • DocGraphLM: Documental Graph Language Model for Information Extraction, Wang et al., arxiv 2023. [paper]
  • FinBERT: A Pretrained Language Model for Financial Communications, Yang et al., arxiv 2020. [paper][Wiley paper][code][finBERT][valuesimplex/FinBERT]
  • FinGPT: Open-Source Financial Large Language Models, Yang et al., IJCAI 2023. [paper][code]
  • FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models, Yang et al., arxiv 2024. [paper][code]
  • FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets, Wang et al., arxiv 2023. [paper][code]
  • Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models, Zhang et al., arxiv 2023. [paper][code]
  • FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, Liu et al., arxiv 2020. [paper][code]
  • FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning, Liu et al., NeurIPS 2022. [paper][code]
  • DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning, Chen et al., arxiv 2023. [paper][code]
  • A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist, Zhang et al., arxiv 2024. [paper]
  • XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters, Zhang et al., arxiv 2023. [paper][code][PIXIU]
  • StructGPT: A General Framework for Large Language Model to Reason over Structured Data, Jiang et al., arxiv 2023. [paper][code]
  • Large Language Model for Table Processing: A Survey, Lu et al., arxiv 2024. [paper][llm-table-survey][table-transformer]
  • A Survey of Large Language Models in Finance (FinLLMs), Lee et al., arxiv 2024. [paper][code]
  • Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow, Zhang et al., arxiv 2023. [paper][code]
  • Data Interpreter: An LLM Agent For Data Science, Hong et al., arxiv 2024. [paper][code]
  • AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework, Li et al., COLING 2024. [paper][code]
  • [gpt-investor][FinGLM]
3.2.5 Information Retrieval
  • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, Khattab et al., SIGIR 2020. [paper]

  • ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, Santhanam et al., NAACL 2022. [paper][code][RAGatouille]

  • ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval, Louis et al., arxiv 2024. [paper][code][model]

  • Large Language Models for Information Retrieval: A Survey, Zhu et al., arxiv 2023. [paper][code]

  • Large Language Models for Generative Information Extraction: A Survey, Xu et al., arxiv 2023. [paper][code][UIE][NERRE]

  • UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models, Li et al., AAAI 2024. [paper]

  • INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Zhu et al., ACL 2024. [paper][code]

  • GenIR: From Matching to Generation: A Survey on Generative Information Retrieval, Li et al., arxiv 2024. [paper][code]

  • SIGIR-AP 2023 Tutorial: Recent Advances in Generative Information Retrieval [link]

  • [search_with_lepton][LLocalSearch][FreeAskInternet][storm][searxng]

3.2.6 Math
  • ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, Gou et al., ICLR 2024. [paper][code]
  • MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models, Lu et al., ICLR 2024 Oral. [paper][code][[MathBench]https://github.com/open-compass/MathBench\]
  • DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Shao et al., arxiv 2024. [paper][code]
  • Common 7B Language Models Already Possess Strong Math Capabilities, Li et al., arxiv 2024. [paper][code]
  • ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline, Xu et al., arxiv 2024. [paper][code]
  • AlphaMath Almost Zero: process Supervision without process, Chen et al., arxiv 2024. [paper][code]
3.2.7 Medicine and Law
  • A Survey of Large Language Models in Medicine: Progress, Application, and Challenge, Zhou et al., arxiv 2023. [paper][code][LLM-for-Healthcare]

  • A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law, Chen et al., arxiv 2024. [paper][code]

  • HuatuoGPT, towards Taming Language Model to Be a Doctor, Zhang et al., arxiv 2023. [paper][code][Medical_NLP][Zhongjing][MedicalGPT]

  • ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases, Cui et al., arxiv 2023. [paper][code]

  • DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services, Yue et al., arxiv 2023. [paper][code]

  • DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Bao et al., arxiv 2023. [paper][code]

  • MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning, Tang et al., arxiv 2023. [paper][code]

  • MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, Chen et al., arxiv 2023. [paper][meditron]

  • Med-PaLM: Large language models encode clinical knowledge, Singhal et al., Nature 2023. [paper][Unofficial Implementation]

  • Capabilities of Gemini Models in Medicine, Saab et al., arxiv 2024. [paper]

  • AMIE: Towards Conversational Diagnostic AI, Tu et al., arxiv 2024. [paper][AMIE-pytorch]

  • Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People, Wang et al., arxiv 2024. [paper][code][Medical_NLP]

  • Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents, Li et al., arxiv 2024. [paper]

  • [openfold][alphafold3-pytorch][AlphaFold3][LucaOne]

3.2.8 Recommend System
  • DIN: Deep Interest Network for Click-Through Rate Prediction, Zhou et al., KDD 2018. [paper][code][DIEN][x-deeplearning]
  • MMoE: Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts, Ma et al., KDD 2018. [paper][DeepCTR-Torch][pytorch-mmoe]
  • Recommender Systems with Generative Retrieval, Rajput et al., NeurIPS 2022. [paper]
  • Unifying Large Language Models and Knowledge Graphs: A Roadmap, Pan et al., arxiv 2023. [paper]
  • YuLan-Rec: User Behavior Simulation with Large Language Model based Agents, Wang et al., arxiv 2023. [paper][code]
  • SSLRec: A Self-Supervised Learning Framework for Recommendation, Ren et al., WSDM 2024 Oral. [paper][code][Awesome-SSLRec-Papers]
  • RLMRec: Representation Learning with Large Language Models for Recommendation, Ren et al., WWW 2024. [paper][code]
  • LLMRec: Large Language Models with Graph Augmentation for Recommendation, Wei et al., WSDM 2024 Oral. [paper][code]
  • Agent4Rec_On Generative Agents in Recommendation, Zhang et al., arxiv 2023. [paper][code]
  • LLM-KERec: Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph, Zhao et al., arxiv 2024. [paper]
  • Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Zhai et al., ICML 2024. [paper][code]
  • Wukong: Towards a Scaling Law for Large-Scale Recommendation, Zhang et al., arxiv 2024. [paper][unofficial code]
  • RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems, Lian et al., arxiv 2024. [paper][code]
  • [recommenders][Source code for Twitter's Recommendation Algorithm][Awesome-RSPapers][RecBole][RecSysDatasets]
3.2.9 Tool Learning
  • Tool Learning with Foundation Models, Qin et al., arxiv 2023. [paper][code]
  • Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al., arxiv 2023. [paper][toolformer-pytorch][toolformer]
  • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al., ICLR 2024 Spotlight. [paper][code][StableToolBench]
  • Gorilla: Large Language Model Connected with Massive APIs, Patil et al., arxiv 2023. [paper][code]
  • GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction, Yang et al., arxiv 2023. [paper][code]
  • LLMCompiler: An LLM Compiler for Parallel Function Calling, Kim et al., arxiv 2023. [paper][code]
  • Large Language Models as Tool Makers, Cai et al, arxiv 2023. [paper][code]
  • ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang et al., arxiv 2023. [paper][code][ToolQA][toolbench]
  • ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search, Zhuang et al., arxiv 2023. [paper][[code]]
  • Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models, Lu et al., NeurIPS 2023. [paper][code]
  • ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios, Ye et al., arxiv 2024. [paper][code]
  • AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Du et al., arxiv 2024. [paper][code]
  • LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error, Wang et al., arxiv 2024. [paper][code]
  • What Are Tools Anyway? A Survey from the Language Model Perspective, Wang et al., arxiv 2024. [paper]
  • [ToolLearningPapers][awesome-tool-llm]

3.3 LLM Technique

  • How to Train Really Large Models on Many GPUs, Lilian Weng, 2021. [blog]
  • Training great LLMs entirely from ground zero in the wilderness as a startup, Yi Tay, 2024. [blog]
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Shoeybi et al., arxiv 2019. [paper][code]
  • ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Rajbhandari et al., arxiv 2019. [paper][DeepSpeed]
  • Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training, Li et al., ICPP 2023. [paper][code]
  • MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, Jiang et al., arxiv 2024. [paper]
  • A Theory on Adam Instability in Large-Scale Machine Learning, Molybog et al., arxiv 2023. [paper]
  • Loss Spike in Training Neural Networks, Zhang et al., arxiv 2023. [paper]
  • Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, Biderman et al., arxiv 2023. [paper][code]
  • Continual Pre-Training of Large Language Models: How to (re)warm your model, Gupta et al., [paper]
  • FLM-101B: An Open LLM and How to Train It with $100K Budget, Li et al., arxiv 2023. [paper][model]
  • Instruction Tuning with GPT-4, Peng et al., arxiv 2023. [paper][code]
  • DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Khattab et al., arxiv 2023. [paper][code]
  • OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning, Ye et al., arxiv 2024. [paper][code]
  • A Survey on Self-Evolution of Large Language Models, Tao et al., arxiv 2024. [paper][code]
3.3.1 Alignment
  • AI Alignment: A Comprehensive Survey, Ji et al., arxiv 2023. [paper][PKU-Alignment]

  • Large Language Model Alignment: A Survey, Shen et al., arxiv 2023. [paper]

  • Aligning Large Language Models with Human: A Survey, Wang et al., arxiv 2023. [paper][code]

  • [alignment-handbook]

  • Self-Instruct: Aligning Language Models with Self-Generated Instructions, Wang et al., ACL 2023. [paper][code]

  • RLHF: [hf blog][OpenAI blog][alignment blog][awesome-RLHF]

  • Secrets of RLHF in Large Language Models [MOSS-RLHF][Part I][Part II]

  • Safe RLHF: Safe Reinforcement Learning from Human Feedback, Dai et al., ICLR 2024 Spotlight. [paper][code]

  • The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization, Huang et al., arxiv 2024. [paper][code][blog][trl]

  • RLHF Workflow: From Reward Modeling to Online RLHF, Dong et al., arxiv 2024. [paper][code]

  • OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework, Hu et al., arxiv 2024. [paper][code]

  • LIMA: Less Is More for Alignment, Zhou et al., NeurIPS 2023. [paper]

  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., NeurIPS 2023 Runner-up Award. [paper][Unofficial Implementation][trl][dpo_trainer]

  • BPO: Black-Box Prompt Optimization: Aligning Large Language Models without Model Training, Cheng et al., arxiv 2023. [paper][code]

  • KTO: Model Alignment as Prospect Theoretic Optimization, Ethayarajh et al., arxiv 2024. [paper][code]

  • SimPO: Simple Preference Optimization with a Reference-Free Reward, Meng et al., arxiv 2024. [paper][code]

  • Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]

  • RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Lee et al., arxiv 2023. [paper][[code]][awesome-RLAIF]

  • Direct Language Model Alignment from Online AI Feedback, Guo et al., arxiv 2024. [paper]

  • ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models, Li et al., arxiv 2023. [paper][code][policy_optimization]

  • Zephyr: Direct Distillation of LM Alignment, Tunstall et al., arxiv 2023. [paper][code]

  • Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, Burns et al., arxiv 2023. [paper][code]

  • SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, Chen et al., arxiv 2024. [paper][code][unofficial implementation]

  • SPPO: Self-Play Preference Optimization for Language Model Alignment, Wu et al., arxiv 2024. [paper]

  • CALM: LLM Augmented LLMs: Expanding Capabilities through Composition, Bansal et al., arxiv 2024. [paper][CALM-pytorch]

  • Self-Rewarding Language Models, Yuan et al., arxiv 2024. [paper][unofficial implementation]

  • Anthropic: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Hubinger et al., arxiv 2024. [paper]

  • LongAlign: A Recipe for Long Context Alignment of Large Language Models, Bai et al., arxiv 2024. [paper][code]

  • Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction, Ji et al., arxiv 2024. [paper][code]

  • A Survey on Knowledge Distillation of Large Language Models, Xu et al., arxiv 2024. [paper][code]

  • NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment, Shen et al., arxiv 2024. [paper][code]

  • Xwin-LM: Strong and Scalable Alignment Practice for LLMs Ni et al., arxiv 2024. [paper][code]

3.3.2 Context Length
  • ALiBi: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, Press et al., ICLR 2022. [paper][code]
  • Positional Interpolation: Extending Context Window of Large Language Models via Positional Interpolation, Chen et al., arxiv 2023. [paper]
  • Scaling Transformer to 1M tokens and beyond with RMT, Bulatov et al., AAAI 2024. [paper][code][LM-RMT]
  • LongNet: Scaling Transformers to 1,000,000,000 Tokens, Ding et al., arxiv 2023. [paper][code][unofficial code]
  • LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models, Chen et al., ICLR 2024 Oral. [paper][code]
  • StreamingLLM: Efficient Streaming Language Models with Attention Sinks, Xiao et al., ICLR 2024. [paper][code][SwiftInfer][SwiftInfer blog]
  • YaRN: Efficient Context Window Extension of Large Language Models, Peng et al., ICLR 2024. [paper][code]
  • LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression, Jiang et al., arxiv 2023. [paper][code]
  • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Ding et al., arxiv 2024. [paper][code]
  • LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, Jin et al., arxiv 2024. [paper][code]
  • The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey, Pawar et al., arxiv 2024. [paper]
  • Data Engineering for Scaling Language Models to 128K Context, Fu et al., arxiv 2024. [paper][code]
  • CEPE: Long-Context Language Modeling with Parallel Context Encoding, Yen et al., arxiv 2024. [paper][code]
  • Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models, Song et al., arxiv 2024. [paper][code]
  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arxiv 2024. [paper][infini-transformer-pytorch][InfiniTransformer][infini-mini-transformer][megalodon]
  • Extending Llama-3's Context Ten-Fold Overnight, Zhang et al., arxiv 2024. [paper][code][activation_beacon]
  • Make Your LLM Fully Utilize the Context, An et al., arxiv 2024. [paper][code]
3.3.3 Corpus
  • [datatrove][datasets][doccano]
  • C4: Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus, Dodge et al., arxiv 2021. [paper][dataset]
  • The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, Laurençon et al., NeurIPS 2023. [paper][code][dataset]
  • The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only, Penedo et al., arxiv 2023. [paper][dataset]
  • Data-Juicer: A One-Stop Data Processing System for Large Language Models, Chen et al., arxiv 2023. [paper][code]
  • UltraFeedback: Boosting Language Models with High-quality Feedback, Cui et al., ICML 2024. [paper][code]
  • What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code]
  • WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset, Qiu et al., arxiv 2024. [paper][dataset][LabelLLM][labelU]
  • Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research, Soldaini et al., arxiv 2024. [paper][code][OLMo]
  • Datasets for Large Language Models: A Comprehensive Survey, Liu et al., arxiv 2024. [paper][Awesome-LLMs-Datasets]
  • DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows, Patel et al., arxiv 2024. [paper][code]
  • Large Language Models for Data Annotation: A Survey, Tan et al., arxiv 2024. [paper][code]
  • Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance, Ye et al., arxiv 2024. [paper][code]
  • COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning, Bai et al., arxiv 2024. [paper][dataset]
  • Best Practices and Lessons Learned on Synthetic Data for Language Models, Liu et al., arxiv 2024. [paper]
  • FineWeb: decanting the web for the finest text data at scale, HuggingFace, 2024. [blogpost][fineweb][fineweb-edu]
3.3.4 Evaluation
3.3.5 Hallucination
  • Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models, Zhang et al., arxiv 2023. [paper][code]
  • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Huang et al., arxiv 2023. [paper][code][Awesome-MLLM-Hallucination]
  • The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, Li et al., arxiv 2024. [paper][code]
  • Chain-of-Verification Reduces Hallucination in Large Language Models, Dhuliawala et al., arxiv 2023. [[paper](https://arxiv.org/abs/2309.11495)]\[[code](https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/Chain-of-Verification)\]
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models, Guan et al., CVPR 2024. [paper][code]
  • Woodpecker: Hallucination Correction for Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
  • TrustLLM: Trustworthiness in Large Language Models, Sun et al., arxiv 2024. [paper][code]
  • SAFE: Long-form factuality in large language models, Wei et al., arxiv 2024. [paper][code]
3.3.6 Inference
3.3.7 MoE
  • Mixture of Experts Explained, Sanseviero et al., Hugging Face Blog 2023. [blog]

  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Shazeer et al., arxiv 2017. [paper][Re-Implementation]

  • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Lepikhin et al., arxiv 2020. [paper][mixture-of-experts]

  • MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Gale et al., arxiv 2022. [paper][code]

  • Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models, Shen et al., arxiv 2023. [paper][[code]]

  • Fast Inference of Mixture-of-Experts Language Models with Offloading, Eliseev and Mazur, arxiv 2023. [paper][code]

  • Mixtral-8×7B: Mixtral of Experts, Jiang et al., arxiv 2023. [paper][code][megablocks-public][model][blog][Chinese-Mixtral-8x7B][Chinese-Mixtral]

  • DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, Dai et al., arxiv 2024. [paper][code]

  • DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, DeepSeek-AI, arxiv 2024. [paper][code]

  • Evolutionary Optimization of Model Merging Recipes, Akiba et al., arxiv 2024. [paper][code]

  • [llama-moe][Aurora][OpenMoE][makeMoE]

3.3.8 PEFT (Parameter-efficient Fine-tuning)
  • [DeepSpeed][DeepSpeedExamples][blog]

  • [Megatron-LM][NeMo][Megatron-DeepSpeed][Megatron-DeepSpeed]

  • [torchtune][torchtitan]

  • [PEFT][trl][accelerate][LLaMA-Factory][LMFlow][xtuner][MFTCoder][llm-foundry][swift]

  • [mergekit][Model Merging][OpenChatKit]

  • LoRA: Low-Rank Adaptation of Large Language Models, Hu et al., arxiv 2021. [paper][code][LoRA From Scratch][lora][dora][MoRA]

  • QLoRA: Efficient Finetuning of Quantized LLMs, Dettmers et al., NeurIPS 2023 Oral. [paper][code][bitsandbytes][unsloth]

  • S-LoRA: Serving Thousands of Concurrent LoRA Adapters, Sheng et al., arxiv 2023. [paper][code]

  • GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, Zhao et al., arxiv 2024. [paper][code]

  • Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL 2021. [paper][code]

  • Adapter: Parameter-Efficient Transfer Learning for NLP, Houlsby et al., ICML 2019. [paper][code][unify-parameter-efficient-tuning]

  • Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning, Poth et al., EMNLP 2023. [paper][code]

  • LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models, Hu et al., EMNLP 2023. [paper][code]

  • LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, Zhang et al., ICLR 2024. [paper][code]

  • LLaMA Pro: Progressive LLaMA with Block Expansion, Wu et al., arxiv 2024. [paper][code]

  • P-Tuning: GPT Understands, Too, Liu et al., arxiv 2021. [paper][code]

  • P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Liu et al., ACL 2022. [paper][code]

  • Towards a Unified View of Parameter-Efficient Transfer Learning, He et al., ICLR 2022. [paper][code]

  • Mixed Precision Training, Micikevicius et al., ICLR 2018. [paper]

  • 8-bit Optimizers via Block-wise Quantization Dettmers et al., ICLR 2022. [paper][code]

  • FP8-LM: Training FP8 Large Language Models Peng et al., arxiv 2023. [paper][code]

  • Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, Han et al., arxiv 2024. [paper]

  • LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning, Pan et al., arxiv 2024. [paper][code]

  • LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, Zheng et al., arxiv 2024. [paper][code]

  • ReFT: Representation Finetuning for Language Models, Wu et al., arxiv 2024. [paper][code]

3.3.9 Prompt Learning
  • OpenPrompt: An Open-source Framework for Prompt-learning, Ding et al., arxiv 2021. [paper][code]

  • Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning, Su et al., arxiv 2022. [paper]

  • Large Language Models Are Human-Level Prompt Engineers, Zhou et al., ICLR 2023. [paper][code]

  • Large Language Models as Optimizers, Yang et al., arxiv 2023. [paper][code]

  • Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4, Bsharat et al., arxiv 2023. [paper][code]

  • Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding, Suzgun and Kalai, arxiv 2024. [paper][code]

  • AutoPrompt: Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases, Levi et al., arxiv 2024. [paper][code][automatic_prompt_engineer][appl][sammo]

  • [PromptPapers][ChatGPT Prompt Engineering for Developers][Prompt Engineering Guide][k12promptguide][gpt-prompt-engineer][awesome-chatgpt-prompts][awesome-chatgpt-prompts-zh]

  • The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP 2021. [paper][code][soft-prompt-tuning][Prompt-Tuning]

  • A Survey on In-context Learning, Dong et al., arxiv 2023. [paper][code]

  • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work, Min et al., EMNLP 2022. [paper][code]

  • Larger language models do in-context learning differently, Wei et al., arxiv 2023. [paper]

  • PAL: Program-aided Language Models, Gao et al., ICML 2023. [paper][code]

  • A Comprehensive Survey on Instruction Following, Lou et al., arxiv 2023. [paper][code]

  • RLHF: Fine-Tuning Language Models from Human Preferences, Ziegler et al., arxiv 2019. [paper][code]

  • RLHF: Learning to summarize from human feedback, Stiennon et al., NeurIPS 2020. [paper][code]

  • RLHF: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]

  • Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper]

  • Instruction Tuning for Large Language Models: A Survey, Zhang et al., arxiv 2023. [paper][code]

  • What learning algorithm is in-context learning? Investigations with linear models, Akyürek et al., ICLR 2023. [paper]

  • Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers, Dai et al., arxiv 2022. [paper][code]

3.3.10 RAG (Retrieval Augmented Generation)
Text Embedding
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers et al., EMNLP 2019. [paper][code][model][model][vec2text]
  • SimCSE: Simple Contrastive Learning of Sentence Embeddings, Gao et al., EMNLP 2021. [paper][code]
  • OpenAI: Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arxiv 2022. [paper][blog]
  • MRL: Matryoshka Representation Learning, Kusupati et al., arxiv 2022. [paper][code]
  • BGE: C-Pack: Packaged Resources To Advance General Chinese Embedding, Xiao et al., arxiv 2023. [paper][code][FlagEmbedding]
  • LLM-Embedder: Retrieve Anything To Augment Large Language Models, Zhang et al., arxiv 2023. [paper][code][FlagEmbedding]
  • BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Chen et al., arxiv 2024. [paper][code][FlagEmbedding]
  • [m3e-base]
  • Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, Günther et al., arxiv 2023. [paper][model]
  • GTE: Towards General Text Embeddings with Multi-stage Contrastive Learning, Li et al., arxiv 2023. [paper][model]
  • [BCEmbedding][bce-embedding-base_v1][bce-reranker-base_v1]
  • [CohereV3]
  • One Embedder, Any Task: Instruction-Finetuned Text Embeddings, Su et al., ACL 2023. [paper][code]
  • E5: Improving Text Embeddings with Large Language Models, Wang et al., arxiv 2024. [paper][code][model][llm2vec]
  • Nomic Embed: Training a Reproducible Long Context Text Embedder, Nussbaum et al., Nomic AI 2024. [paper][code]
  • GritLM: Generative Representational Instruction Tuning, Muennighoff et al., arxiv 2024. [paper][code]
3.3.11 Reasoning and Planning
  • Few-Shot-CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al., NeurIPS 2022. [paper][chain-of-thought-hub]

  • Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR 2023. [paper]

  • Zero-Shot-CoT: Large Language Models are Zero-Shot Reasoners, Kojima et al., NeurIPS 2022. [paper][code]

  • Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models, Zhang et al., ICLR 2023. [paper][code]

  • Multimodal Chain-of-Thought Reasoning in Language Models, Zhang et al., arxiv 2023. [paper][code]

  • Chain-of-Thought Reasoning Without Prompting, Wang et al., arxiv 2024. [paper]

  • ReAct: Synergizing Reasoning and Acting in Language Models, Yao et al., ICLR 2023. [paper][code]

  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al., arxiv 2023. [paper][code]

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., NeurIPS 2023. [paper][code][Plug in and Play Implementation][tree-of-thought-prompting]

  • Graph of Thoughts: Solving Elaborate Problems with Large Language Models, Besta et al., arxiv 2023. [paper][code]

  • Cumulative Reasoning with Large Language Models, Zhang et al., arxiv 2023. [paper][code]

  • Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Sel et al., arxiv 2023. [paper][unofficial code]

  • Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation, Ding et al., arxiv 2023. [paper][code]

  • Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models, Ye et al., arxiv 2024. [paper][code]

  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, Zhou et al., ICLR 2023. [paper]

  • DEPS: Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, Wang et al., arxiv 2023. [paper][code]

  • RAP: Reasoning with Language Model is Planning with World Model, Hao et al., arxiv 2023. [paper][code]

  • LEMA: Learning From Mistakes Makes LLM Better Reasoner, An et al., arxiv 2023. [paper][code]

  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, Chen et al., TMLR 2023. [paper][code]

  • Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Li et al., arxiv 2023. [paper][[code]]

  • The Impact of Reasoning Step Length on Large Language Models, Jin et al., arxiv 2024. [paper][code]

  • Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, Wang et al., ACL 2023. [paper][code][maestro]

  • Improving Factuality and Reasoning in Language Models through Multiagent Debate, Du et al., arxiv 2023. [paper][code][Multi-Agents-Debate]

  • Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., arxiv 2023. [paper][code]

  • Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al., NeurIPS 2023. [paper][code]

  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, Gou et al., ICLR 2024. [paper][code]

  • Self-Discover: Large Language Models Self-Compose Reasoning Structures, Zhou et al., arxiv 2024. [paper][unofficial implementation][SELF-DISCOVER]

  • RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation, Wang et al., arxiv 2024. [paper][code]

  • KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents, Zhu et al., arxiv 2024. [paper][code][KnowLM]

  • Advancing LLM Reasoning Generalists with Preference Trees, Yuan et al., arxiv 2024. [paper][code]

  • Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, Yang et al., arxiv 2024. [paper][code][SymbCoT]

  • ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models, Singh et al., arxiv 2023. [paper][unofficial code]

  • ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent, Aksitov et al., arxiv 2023. [paper][[code]]

  • Orca 2: Teaching Small Language Models How to Reason, Mitra et al., arxiv 2023. [paper][[code]]

  • Searchformer: Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping*, Lehnert et al., arxiv 2024. [paper]

  • How Far Are We from Intelligent Visual Deductive Reasoning?, Zhang et al., arxiv 2024. [paper][code]

  • [llm-reasoners]

Survey

3.4 LLM Theory

  • Scaling Laws for Neural Language Models, Kaplan et al., arxiv 2020. [paper][unofficial code]

  • Emergent Abilities of Large Language Models, Wei et al., TMRL 2022. [paper]

  • Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., arxiv 2022. [paper]

  • Scaling Laws for Autoregressive Generative Modeling, Henighan et al., arxiv 2020. [paper]

  • Are Emergent Abilities of Large Language Models a Mirage, Schaeffer et al., NeurIPS 2023 Outstanding Paper. [paper]

  • Understanding Emergent Abilities of Language Models from the Loss Perspective, Du et al., arxiv 2024. [paper]

  • S2A: System 2 Attention (is something you might need too), Weston et al., arxiv 2023. [paper]

  • Scaling Laws for Downstream Task Performance of Large Language Models, Isik et al., arxiv 2024. [paper]

  • Scalable Pre-training of Large Autoregressive Image Models, El-Nouby et al., arxiv 2024. [paper][code]

  • When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, Zhang et al., ICLR 2024. [paper]

  • Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws, Allen-Zhu et al, arxiv 2024. [paper]

  • Language Modeling Is Compression, Delétang et al., arxiv 2023. [paper]

  • Language Models Represent Space and Time, Gurnee and Tegmark, ICLR 2024. [paper][code]

  • The Platonic Representation Hypothesis, Huh et al., arxiv 2024. [paper][code]

  • Observational Scaling Laws and the Predictability of Language Model Performance, Ruan et al., arxiv 2024. [paper][code]

  • Language models can explain neurons in language models, OpenAI, 2023. [blog][code][transformer-debugger]

  • Scaling and evaluating sparse autoencoders, Gao et al., arxiv 2024. [OpenAI Blog][paper][code]

  • Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic, 2023. [blog]

  • Mapping the Mind of a Large Language Model, Anthropic, 2024. [blog]

  • Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era, Wu et al., arxiv 2024. [paper][code]

  • LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models, Tufanov et al., arxiv 2024. [paper][code]

  • ROME: Locating and Editing Factual Associations in GPT, Meng et al., NeurIPS 2022. [paper][code][FastEdit]

  • Editing Large Language Models: Problems, Methods, and Opportunities, Yao et al., EMNLP 2023. [paper][code]

  • A Comprehensive Study of Knowledge Editing for Large Language Models, Zhang et al., arxiv 2024. [paper][code]

3.5 Chinese Model


CV

  • CS231n: Deep Learning for Computer Vision [link]

1. Basic for CV

  • AlexNet: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012. [paper]
  • VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., ICLR 2015. [paper]
  • GoogLeNet: Going Deeper with Convolutions, Szegedy et al., CVPR 2015. [paper]
  • ResNet: Deep Residual Learning for Image Recognition, He et al., CVPR 2016 Best Paper. [paper][code]
  • DenseNet: Densely Connected Convolutional Networks, Huang et al., CVPR 2017 Oral. [paper][code]
  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan et al., ICML 2019. [paper][code][EfficientNet-PyTorch]
  • BYOL: Bootstrap your own latent: A new approach to self-supervised Learning, Grill et al., arxiv 2020. [paper][code][byol-pytorch]

2. Contrastive Learning

  • MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, He et al., CVPR 2020. [paper][code]

  • SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., PMLR 2020. [paper][code]

  • DINOv2: Learning Robust Visual Features without Supervision, Oquab et al., arxiv 2023. [paper][code]

  • FeatUp: A Model-Agnostic Framework for Features at Any Resolution, Fu et al., ICLR 2024. [paper][code]

  • InfoNCE Loss: Representation Learning with Contrastive Predictive Coding, Oord et al., arxiv 2018. [paper][unofficial code]

3. CV Application

4. Foundation Model

  • ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al., ICLR 2021. [paper][code][Pytorch Implementation][efficientvit][EfficientFormer][ViT-Adapter]

  • ViT-Adapter: Vision Transformer Adapter for Dense Predictions, Chen et al., ICLR 2023 Spotlight. [paper][code]

  • Vision Transformers Need Registers, Darcet et al., ICLR 2024 Outstanding Paper. [paper]

  • DeiT: Training data-efficient image transformers & distillation through attention, Touvron et al., ICML 2021. [paper][code]

  • ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision, Kim et al., ICML 2021. [paper][code]

  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu et al., ICCV 2021. [paper][code]

  • MAE: Masked Autoencoders Are Scalable Vision Learners, He et al., CVPR 2022. [paper][code]

  • LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models, Bai et al., arxiv 2023. [paper][code]

  • GLEE: General Object Foundation Model for Images and Videos at Scale, Wu wt al., CVPR 2024. [paper][code]

  • Tokenize Anything via Prompting, Pan et al., arxiv 2023. [paper][code]

  • Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Zhu et al., arxiv 2024. [paper][code][VMamba][mambaout]

  • Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Yang et al., arxiv 2024. [paper][code]

  • Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, Guo et al., arxiv 2024. [paper][code]

  • [pytorch-image-models][Pointcept]

5. Generative Model (GAN and VAE)

  • GAN: Generative Adversarial Networks, Goodfellow et al., arxiv 2014. [paper][code][Pytorch-GAN]
  • StyleGAN3: Alias-Free Generative Adversarial Networks, Karras etal., NeurIPS 2021. [paper][code]
  • GigaGAN: Scaling up GANs for Text-to-Image Synthesis, Kang et al., arxiv 2023. [paper][code]
  • [pytorch-CycleGAN-and-pix2pix][img2img-turbo]
  • VAE: Auto-Encoding Variational Bayes, Kingma et al., arxiv 2013. [paper][code][Pytorch-VAE]
  • VQ-VAE: Neural Discrete Representation Learning, Oord et al., NIPS 2017. [paper][code][vector-quantize-pytorch]
  • VQ-VAE-2: Generating Diverse High-Fidelity Images with VQ-VAE-2, Razavi et al., arxiv 2019. [paper][code]
  • VQGAN: Taming Transformers for High-Resolution Image Synthesis, Esser et al., CVPR 2021. [paper][code]
  • Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, Tian et al., arxiv 2024. [paper][code]

6. Image Editing

  • InstructPix2Pix: Learning to Follow Image Editing Instructions, Brooks et al., CVPR 2023 Highlight. [paper][code]
  • Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, Pan et al., SIGGRAPH 2023. [paper][code]
  • DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing, Shi et al., arxiv 2023. [paper][code]
  • DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, Mou et al., ICLR 2024 Spolight. [paper][code]
  • LEDITS++: Limitless Image Editing using Text-to-Image Models, Brack et al., arxiv 2023. [paper][code][demo]
  • Diffusion Model-Based Image Editing: A Survey, Huang et al., arxiv 2024. [paper][code]

7. Object Detection

  • DETR: End-to-End Object Detection with Transformers, Carion et al., arxiv 2020. [paper][code]

  • Focus-DERT: Less is More_Focus Attention for Efficient DETR, Zheng et al., arxiv 2023. [paper][code]

  • U2-Net_Going Deeper with Nested U-Structure for Salient Object Detection, Qin et al., arxiv 2020. [paper][code]

  • YOLO: You Only Look Once: Unified, Real-Time Object Detection Redmon et al., arxiv 2015. [paper]

  • YOLOX: Exceeding YOLO Series in 2021, Ge et al., arxiv 2021. [paper][code]

  • Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism, Wang et al., arxiv 2023. [paper][code]

  • YOLO-World: Real-Time Open-Vocabulary Object Detection, Cheng et al., arxiv 2024. [paper][code]

  • YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, Wang et al., arxiv 2024. [paper][code]

  • T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, Jiang et al., arxiv 2024. [paper][code]

  • YOLOv10: Real-Time End-to-End Object Detection, Wang et al., arxiv 2024. [paper][yolov10]

  • [detectron2][yolov5][mmdetection][detrex]

8. Semantic Segmentation

  • U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al., MICCAI 2015. [paper][code]

  • Segment Anything, Kirillov et al., ICCV 2023. [paper][code]

  • EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, Xiong et al., CVPR 2024. [paper][code]

  • Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks, Ren et al., arxiv 2024. [paper][code]

  • [mmsegmentation][mmdeploy][Painter]

9. Video

  • VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training, Tong et al., NeurIPS 2022 Spotlight. [paper][code]
  • MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation, Wang et al., arxiv 2024. [paper]
  • [V-JEPA][I-JEPA]
  • VideoMamba: State Space Model for Efficient Video Understanding, Li et al., arxiv 2024. [paper][code]
  • VideoChat: Chat-Centric Video Understanding, Li et al., CVPR 2024 Highlight. [paper][code]

10. Survey for CV

  • ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy, Vishniakov et al., arxiv 2023. [paper][code]
  • Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey, Xin et al., arxiv 2024. [paper][code]

Multimodal

1. Audio

2. Blip

  • ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation, Li et al., NeurIPS 2021. [paper][code]
  • BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, Li et al., arxiv 2022. [paper][code]
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, Li et al., arxiv 2023. [paper][code]
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning, Dai et al., arxiv 2023. [paper][code]
  • X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning, Panagopoulou et al., arxiv 2023. [paper][code]
  • LAVIS: A Library for Language-Vision Intelligence, Li et al., arxiv 2022. [paper][code]
  • VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts, Bao et al., NeurIPS 2022. [paper][code]
  • BEiT: BERT Pre-Training of Image Transformers, Bao et al., ICLR 2022 Oral presentation. [paper][code]
  • BeiT-V3: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Wang et al., CVPR 2023. [paper][code]

3. Clip

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision, Radford et al., ICML 2021. [paper][code][clip-as-service][open_clip]
  • DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code]
  • HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention, Geng et al., ICLR 2023. [paper][code]
  • Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese, Yang et al., arxiv 2022. [paper][code]
  • MetaCLIP: Demystifying CLIP Data, Xu et al., ICLR 2024 Spotlight. [paper][code]
  • Alpha-CLIP: A CLIP Model Focusing on Wherever You Want, Sun et al., arxiv 2023. [paper][code][Bootstrap3D]
  • MMVP: Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code]
  • MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training, Vasu et al., CVPR 20224. [paper][code]
  • Long-CLIP: Unlocking the Long-Text Capability of CLIP, Zhang et al., arxiv 2024. [paper][code]

4. Diffusion Model

  • Tutorial on Diffusion Models for Imaging and Vision, Stanley H. Chan, arxiv 2024. [paper]

  • Denoising Diffusion Probabilistic Models, Ho et al., NeurIPS 2020. [paper][code][Pytorch Implementation][RDDM]

  • Improved Denoising Diffusion Probabilistic Models, Nichol and Dhariwal, ICML 2021. [paper][code]

  • Diffusion Models Beat GANs on Image Synthesis, Dhariwal and Nichol, NeurIPS 2021. [paper][code]

  • Classifier-Free Diffusion Guidance, Ho and Salimans, NeurIPS 2021. [paper][code]

  • GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Nichol et al., arxiv 2021. [paper][code]

  • DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code][dalle-mini]

  • Stable-Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models, Rombach et al., CVPR 2022. [paper][code][CompVis/stable-diffusion][Stability-AI/stablediffusion][ml-stable-diffusion]

  • SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et al., arxiv 2023. [paper][code][SDXL-Lightning]

  • Introducing Stable Cascade, Stability AI, 2024. [link][code][model]

  • SDXL-Turbo: Adversarial Diffusion Distillation, Sauer et al., arxiv 2023. [paper][code]

  • LCM: Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, Luo et al., arxiv 2023. [paper][code][Hyper-SD]

  • LCM-LoRA: A Universal Stable-Diffusion Acceleration Module, Luo et al., arxiv 2023. [paper][code]

  • Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, Esser et al., arxiv 2024. [paper][mmdit]

  • SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation, Sauer et al., arxiv 2024. [paper]

  • StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation, Kodaira et al., arxiv 2023. [paper][code]

  • DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models, Marjit et al., arxiv 2024. [paper][code]

  • Video Diffusion Models, Ho et al., arxiv 2022. [paper][code]

  • Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, Blattmann et al., arxiv 2023. [paper][code]

  • Consistency Models, Song et al., arxiv 2023. [paper][code][Consistency Decoder]

  • A Survey on Video Diffusion Models, Xing et al., srxiv 2023. [paper][code]

  • Diffusion Models: A Comprehensive Survey of Methods and Applications, Yang et al., arxiv 2023. [paper][code]

  • Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation, Yu et al., arxiv 2023. [paper]

  • The Chosen One: Consistent Characters in Text-to-Image Diffusion Models, Avrahami et al., arxiv 2023. [paper][code]

  • U-ViT: All are Worth Words: A ViT Backbone for Diffusion Models, Bao et al., CVPR 2023. [paper][code]

  • UniDiffuser: One Transformer Fits All Distributions in Multi-Modal Diffusion, Bao et al., arxiv 2023. [paper][code]

  • l-DAE: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, Chen et al., arxiv 2024. [paper]

  • DiT: Scalable Diffusion Models with Transformers, Peebles et al., ICCV 2023 Oral. [paper][code][OpenDiT][MDT]

  • SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers, Ma et al., arxiv 2024. [paper][code]

  • Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis, Ren et al., arxiv 2024. [paper][model]

  • Github Repositories

  • [Awesome-Diffusion-Models][Awesome-Video-Diffusion]

  • [stable-diffusion-webui][stable-diffusion-webui-colab][sd-webui-controlnet][stable-diffusion-webui-forge][automatic]

  • [Fooocus][Omost]

  • [ComfyUI][streamlit][gradio][ComfyUI-Workflows-ZHO]

  • [diffusers]

5. Multimodal LLM

  • LLaVA: Visual Instruction Tuning, Liu et al., NeurIPS 2023 Oral. [paper][code][vip-llava][LLaVA-pp][TinyLLaVA_Factory][LLaVA-RLHF]

  • LLaVA-1.5: Improved Baselines with Visual Instruction Tuning, Liu et al., arxiv 2023. [paper][code]

  • LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day, Li et al., arxiv 2023. [paper][code]

  • Video-LLaVA: Learning United Visual Representation by Alignment Before Projection, Lin et al., arxiv 2023. [paper][code]

  • MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, Lin et al., arxiv 2024. [paper][code]

  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models, Zhu et al., arxiv 2023. [paper][code]

  • MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning, Chen et al., arxiv 2023. [paper][code]

  • MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens, Ataallah et al., arxiv 2024. [paper][code]

  • MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens, Zheng et al., arxiv 2023. [paper][code]

  • Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS 2022. [paper][open-flamingo][flamingo-pytorch]

  • Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding, Zhang et al., EMNLP 2023. [paper][code]

  • BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs, Zhao et al., arxiv 2023. [paper][code][AnyGPT]

  • Emu: Generative Pretraining in Multimodality, Sun et al., ICLR 2024. [paper][code]

  • CogVLM: Visual Expert for Pretrained Language Models, Wang et al., arxiv 2023. [paper][code][CogVLM2][VisualGLM-6B][CogCoM]

  • DreamLLM: Synergistic Multimodal Comprehension and Creation, Dong et al., ICLR 2024 Spotlight. [paper][code]

  • NExT-GPT: Any-to-Any Multimodal LLM, Wu et al., arxiv 2023. [paper][code]

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, Wu et al., arxiv 2023. [paper][code]

  • SoM: Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V, Yang et al., arxiv 2023. [paper][code]

  • Ferret: Refer and Ground Anything Anywhere at Any Granularity, You et al., arxiv 2023. [paper][code][Ferret-UI]

  • Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond, Bai et al., arxiv 2023. [paper][code]

  • InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition, Zhang et al., arxiv 2023. [paper][code]

  • InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks, Chen et al., CVPR 2024. [paper][code][InternVideo][InternVid]

  • DeepSeek-VL: Towards Real-World Vision-Language Understanding, Lu et al., arxiv 2024. [paper][code]

  • ShareGPT4V: Improving Large Multi-Modal Models with Better Captions, Chen et al., arxiv 2023. [paper][code][ShareGPT4Video]

  • TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones, Yuan et al., arxiv 2023. [paper][code]

  • Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models, Li et al., CVPR 2024. [paper][code]

  • Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models, Wei et al., arxiv 2023. [paper][code]

  • Vary-toy: Small Language Model Meets with Reinforced Vision Vocabulary, Wei et al., arxiv 2024. [paper][code]

  • LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]

  • Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper]

  • Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts, Li et al., arxiv 2024. [paper][code]

  • RL4VLM: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning, Zhai et al., arxiv 2024. [paper][code][RLHF-V][RLAIF-V]

  • [MiniCPM-V][moondream][MobileVLM][OmniFusion][Bunny]

6. Text2Image

  • DALL-E: Zero-Shot Text-to-Image Generation, Ramesh et al., arxiv 2021. [paper][code]

  • DALL-E3: Improving Image Generation with Better Captions, Betker et al., OpenAI 2023. [paper][code][blog]

  • ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models, Zhang et al., ICCV 2023 Marr Prize. [paper][code]

  • T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, Mou et al., AAAI 2024. [paper][code]

  • AnyText: Multilingual Visual Text Generation And Editing, Tuo et al., arxiv 2023. [paper][code]

  • RPG: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs, Yang et al., ICML 2024. [paper][code]

  • LAION-5B: An open large-scale dataset for training next generation image-text models, Schuhmann et al., NeurIPS 2022. [paper][code][blog]

  • DeepFloyd IF: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., arxiv 2022. [paper][code]

  • Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., NeurIPS 2022. [paper][unofficial code]

  • Instruct-Imagen: Image Generation with Multi-modal Instruction, Hu et al., arxiv 2024. [paper]

  • TextDiffuser: Diffusion Models as Text Painters, Chen et al., arxiv 2023. [paper][code]

  • TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering, Chen et al., arxiv 2023. [paper][code]

  • PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, Chen et al., arxiv 2023. [paper][code]

  • PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models, Chen et al., arxiv 2024. [paper][code]

  • PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation, Chen et al., arxiv 2024. [paper][code]

  • IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, Ye et al., arxiv 2023. [paper][code][ID-Animator]

  • Controllable Generation with Text-to-Image Diffusion Models: A Survey, Cao et al., arxiv 2024. [paper][code]

  • StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation, Zhou et al., arxiv 2024. [paper][code]

  • Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding, Li et al., arxiv 2024. [paper][code]

7. Text2Video

  • Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Hu et al., arxiv 2023. [paper][code][Open-AnimateAnyone][Moore-AnimateAnyone][AnimateAnyone]

  • EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, Tian et al., arxiv 2024. [paper][code]

  • AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation, Wei wt al., arxiv 2024. [paper][code]

  • DreaMoving: A Human Video Generation Framework based on Diffusion Models, Feng et al., arxiv 2023. [paper][code]

  • MagicAnimate:Temporally Consistent Human Image Animation using Diffusion Model, Xu et al., arxiv 2023. [paper][code][champ]

  • DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors, Xing et al., arxiv 2023. [paper][code]

  • FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis, Liang et al., arxiv 2023. [paper][code]

  • [Awesome-Video-Diffusion]

  • Video Diffusion Models, Ho et al., arxiv 2022. [paper][video-diffusion-pytorch]

  • Make-A-Video: Text-to-Video Generation without Text-Video Data, Singer et al., arxiv 2022. [paper][make-a-video-pytorch]

  • Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, Wu et al., ICCV 2023. [paper][code]

  • Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators, Khachatryan et al., ICCV 2023 Oral. [paper][code][StreamingT2V]

  • CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers, Hong et al., ICLR 2023. [paper][code]

  • Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos, Ma et al., AAAI 2024. [paper][code][Follow-Your-Pose v2][Follow-Your-Emoji]

  • Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts, Ma et al., arxiv 2024. [paper][code]

  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, Guo et al., arxiv 2023. [paper][code][AnimateDiff-Lightning]

  • StableVideo: Text-driven Consistency-aware Diffusion Video Editing, Chai et al., ICCV 2023. [paper][code]

  • I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models, Zhang et al., arxiv 2023. [paper][code]

  • TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos, Wang et al., arxiv 2023. [paper][code]

  • Lumiere: A Space-Time Diffusion Model for Video Generation, Bar-Tal et al., arxiv 2024. [paper][lumiere-pytorch]

  • Sora: Creating video from text, OpenAI, 2024. [blog][Open-Sora][Open-Sora-Plan][minisora][SoraWebui][MuseV][PhysDreamer][easyanimate]

  • Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Liu et al., arxiv 2024. [paper][code]

  • Mora: Enabling Generalist Video Generation via A Multi-Agent Framework, Yuan et al., arxiv 2024. [paper][code]

  • Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, Dehghani et al., NeurIPS 2024. [paper][unofficial code]

  • VideoPoet: A Large Language Model for Zero-Shot Video Generation, Kondratyuk et al., arxiv 2023. [paper]

  • Latte: Latent Diffusion Transformer for Video Generation, Ma et al., arxiv 2024. [paper][code][LaVIT]

  • Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis, Menapace et al., arxiv 2024. [paper][articulated-animation]

  • [MoneyPrinterTurbo][videos]

8. Survey for Multimodal

  • A Survey on Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
  • Multimodal Foundation Models: From Specialists to General-Purpose Assistants, Li et al., arxiv 2023. [paper][cvinw_readings]
  • From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities, Lu et al., arxiv 2024. [paper][Leaderboards]
  • Efficient Multimodal Large Language Models: A Survey, Jin et al., arxiv 2024. [paper][code]
  • An Introduction to Vision-Language Modeling, Bordes et al., arxiv 2024. [paper]

9. Other

  • Fuyu-8B: A Multimodal Architecture for AI Agents Bavishi et al., Adept blog 2023. [blog][model]
  • Otter: A Multi-Modal Model with In-Context Instruction Tuning, Li et al., arxiv 2023. [paper][code]
  • OtterHD: A High-Resolution Multi-modality Model, Li et al., arxiv 2023. [paper][code][model]
  • CM3leon: Scaling Autoregressive Multi-Modal Models_Pretraining and Instruction Tuning, Yu et al., arxiv 2023. [paper][Unofficial Implementation]
  • MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer, Tian et al., arxiv 2024. [paper][code]
  • CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations, Qi et al., arxiv 2024. [paper][code]
  • SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models, Gao et al., arxiv 2024. [paper][code][Lumina-T2X]
  • LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]

Reinforcement Learning

1.Basic for RL

2. LLM for decision making

  • Decision Transformer_Reinforcement Learning via Sequence Modeling, Chen et al., NeurIPS 2021. [paper][code]
  • Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner et al., NeurIPS 2021. [paper][code]
  • Guiding Pretraining in Reinforcement Learning with Large Language Models, Du et al., ICML 2023. [paper][code]
  • Introspective Tips: Large Language Model for In-Context Decision Making, Chen et al., arxiv 2023. [paper]
  • Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Chebotar et al., CoRL 2023. [paper][Unofficial Implementation]
  • Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods, Cao et al., arxiv 2024. [paper]

GNN

  • [GNNPapers][dgl]

  • A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., Distill 2021. [paper]

  • CS224W: Machine Learning with Graphs, Stanford. [link]

  • GCN: Semi-Supervised Classification with Graph Convolutional Networks, Kipf and Welling, ICLR 2017. [paper][code][pygcn]

  • GAE: Variational Graph Auto-Encoders, Kipf and Welling, arxiv 2016. [paper][code][gae-pytorch]

  • GAT: Graph Attention Networks, Veličković et al., ICLR 2018. [paper][code][pyGAT][pytorch-GAT]

  • GIN: How Powerful are Graph Neural Networks?, Xu et al., ICLR 2019. [paper][code]

  • Graphormer: Do Transformers Really Perform Bad for Graph Representation, Ying et al., NeurIPS 2021. [paper][code]

  • GraphGPT: Graph Instruction Tuning for Large Language Models, Tang et al., SIGIR 2024. [paper][code]

  • OpenGraph: Towards Open Graph Foundation Models, Xia et al., arxiv 2024. [paper][code]

  • [pytorch_geometric]

Survey for GNN


Transformer Architecture