Train RLHF using CarperAI Trlx #687

alyxdow · 2023-04-01T16:31:51Z

alyxdow
Apr 1, 2023

📢 Proposal: Train RLHF using CarperAI Trlx 🤖

We propose to train a Reinforcement Learning from Human Feedback (RLHF) model using CarperAI Trlx, a distributed training framework designed to fine-tune large language models with reinforcement learning. Our goal is to improve the conversational abilities of language models and create a chatbot that can better engage with humans.

🚀 Methods:
We will train an initial model using supervised fine-tuning, where human AI trainers will provide conversations in which they play both sides- the user and an AI assistant. We will give the trainers access to model-written suggestions to help them compose their responses. We will mix this new dialogue dataset with the InstructGPT dataset, which we will transform into a dialogue format.

To create a reward model for reinforcement learning, we will collect comparison data, which will consist of two or more model responses ranked by quality. To collect this data, we will take conversations that AI trainers had with the chatbot. We will randomly select a model-written message, sample several alternative completions, and have AI trainers rank them. Using these reward models, we will fine-tune the model using Proximal Policy Optimization. We will perform several iterations of this process.

💽 Datasets:
https://huggingface.co/datasets/Anthropic/hh-rlhf
https://huggingface.co/datasets/HuggingFaceH4/helpful-anthropic-raw
https://www.surgehq.ai/datasets/instructgpt-style-dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train RLHF using CarperAI Trlx #687

{{title}}

Replies: 0 comments

Select a reply

Train RLHF using CarperAI Trlx #687

alyxdow Apr 1, 2023

Replies: 0 comments

alyxdow
Apr 1, 2023