Rate the projects to track “best” #20

wolfpixels · 2023-03-31T01:22:06Z

Hey,

Would be useful to include some sort of rating to track the best or most humanlike interactions.

If you need someone to help manage, I’m down to help

nichtdax · 2023-04-01T03:32:17Z

It's kinda a good idea. But how do you define "most humanlike interactions"? Like what is the benchmark and methodology you would use to rate projects?

You can make a PR to propose your rating thingy

wolfpixels · 2023-04-01T18:26:23Z

OpenAI track

Reasoning
Speed
Conciseness

A few people trying each and then taking the average of a scale from 1-5 should work.
I'll make a PR when I'm back tonight.

nicognaW · 2023-04-01T19:39:32Z

This is so hard to do that I don't think I'm qualified, but I did find this useful repo as a reference: https://github.com/manyoso/haltt4llm. This is a fantastic idea, and I greatly appreciate it.

nichtdax · 2023-04-02T08:53:17Z

Okay just take your time

wolfpixels · 2023-04-04T21:31:25Z

hey just jumping back here to say: I looked at how to rank these, it seems to be a common problem for many people right now. - From this video: https://www.youtube.com/watch?v=4VByC2NpV30 about Vicuna (apparently 90% of ChatGPT quality).

With the rate of development in this field, I think it's better to let the best projects propagate by the collective word of mouth of the AI community, to save us time.

I'm happy to add projects which seem promising. Do you have any way I can send you a msg?

nicognaW · 2023-04-06T18:16:35Z

I did a little bit research on this, apparently Vicuna works best so far, meanwhile for Chinese users, ChatGLM seems to be the best.

nichtdax · 2023-04-08T08:46:50Z

Sorry I don't have a way to send a msg. This Github thread is the only way of communication.

wolfpixels · 2023-04-15T23:46:27Z

Interesting update, using GPT4 to rate other LLMs on performance.
ais rating other ais haha, wild.

I think doing this is easy(ish) as well, potential option for this repo. I'd like to get updates on this project, as in hear about the development of open source LLMs, have you considered making a newsletter or something? Checking Github is quite tiresome. I'd like to get an email on the updates. This repository would also be a great place to market it. I'm confident other people would be interested in such a thing too

nichtdax · 2023-04-18T01:50:30Z

I think using GPT-4 to rate other model is a good way to show people how other models are different from OpenAI's models.

nicognaW · 2023-05-05T19:37:21Z

I would recommend that anyone interested in finding the best open source LLM with a limited or commercial-friendly license but lacking the time and energy to stay up-to-date with the latest AI news should periodically check the https://chat.lmsys.org/ website. They have deployed multiple SOTA models that you can not only try but also evaluate. Additionally, they provide a leader board with convincing statistics and a comprehensive list of open-source models ranked by score.

endolith · 2023-05-05T19:48:07Z

@nicognaW Ooh it has a battle mode and a leaderboard, too! https://chat.lmsys.org/?leaderboard

alreadydone · 2023-05-06T06:17:19Z

lmsys's Elo rating approach is interesting. See also
https://gpt4all.io/index.html, "Performance Benchmarks" (displayed only on desktop, not on mobile)
https://www.mosaicml.com/blog/mpt-7b, Table 1
for some comprehensive benchmarking results.

nicognaW · 2023-05-29T05:32:18Z

Another evaluation with a leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.

BTW, given the era where open-source language models are flourishing, the list in this repository may not be up-to-date. So it's recommended also to refer to the leaderboards mentioned in this issue for the latest information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate the projects to track “best” #20

Rate the projects to track “best” #20

wolfpixels commented Mar 31, 2023

nichtdax commented Apr 1, 2023

wolfpixels commented Apr 1, 2023 •

edited

nicognaW commented Apr 1, 2023

nichtdax commented Apr 2, 2023

wolfpixels commented Apr 4, 2023

nicognaW commented Apr 6, 2023

nichtdax commented Apr 8, 2023

wolfpixels commented Apr 15, 2023

nichtdax commented Apr 18, 2023 •

edited

nicognaW commented May 5, 2023

endolith commented May 5, 2023

alreadydone commented May 6, 2023 •

edited

nicognaW commented May 29, 2023

Rate the projects to track “best” #20

Rate the projects to track “best” #20

Comments

wolfpixels commented Mar 31, 2023

nichtdax commented Apr 1, 2023

wolfpixels commented Apr 1, 2023 • edited

nicognaW commented Apr 1, 2023

nichtdax commented Apr 2, 2023

wolfpixels commented Apr 4, 2023

nicognaW commented Apr 6, 2023

nichtdax commented Apr 8, 2023

wolfpixels commented Apr 15, 2023

nichtdax commented Apr 18, 2023 • edited

nicognaW commented May 5, 2023

endolith commented May 5, 2023

alreadydone commented May 6, 2023 • edited

nicognaW commented May 29, 2023

wolfpixels commented Apr 1, 2023 •

edited

nichtdax commented Apr 18, 2023 •

edited

alreadydone commented May 6, 2023 •

edited