Group evolution,next generation LLM training paradigm

0 LLMS can mutually compete with and enlighten each other to form a stronger group

1 Multi agents can be SFTed by other agents' positive samples and guided by self experiences with rl. This shows how human evolved

2 Agents can have lifetime,top Agents clone themselves,bottom Agents vanish.

3 All the mechanism realize a ecosystem obeying Theory of Evolution.

4 This training paradigm will work in all multiagents scene,Ads\Recommendation\Unmanned call center\Unmanned Vehicle\Human kind robots\Drones

5 LLMs can be deployed in different data center,which means this fit power/data distribution well and can scale up to the whole world just like what we human being do

Experiment result: After TRL demo reaches a reward of 1.6, it begins to vibrate; RL&SFT interleaved training gets a reward of 2.65; Dynamic sample_keep threshhold pushes the score to 2.85; Evolution mechanism lifts the reward at evolution4 by 0.15,reward can keep rising in 45 epochs; To avoid the sft fail from no training data caused by a too high dynamic threshhold,we always keep the max reward sample no matter above or blew the threshhold; Finally, all llms in the group beat reward model upper bound at 2.9x with a max reward of 2.99

These parameters will balance the training speed and final score: dynamic threshhold coefficient 1.1,threshhold 2.96,epoch 45,ppo steps/epoch 4,num_llms 3,evla_batchs 1

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
README.md		README.md
llms_group_evolution.py		llms_group_evolution.py
llms_independent_evolution.py		llms_independent_evolution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llms_group_evolution.py

llms_group_evolution.py

llms_independent_evolution.py

llms_independent_evolution.py

Repository files navigation

Group evolution,next generation LLM training paradigm

About

Releases

Packages

Languages

lun2hang/llms_group_evolution

Folders and files

Latest commit

History

Repository files navigation

Group evolution,next generation LLM training paradigm

About

Topics

Resources

Stars

Watchers

Forks

Languages