Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence. #130

Konisberg · 2024-03-26T02:54:58Z

Sorry to raise the problem but give no systematic analysis
It may be about to take me more time on more complete investigation over the "compression" ability of LLM as many may be support "compression is intelligence".
In my view, the ability of Agents nowadays could hardly be termed "autonomous", meanwhile the prompting just guides the LLM to tell the humans what the LLM has compressed, which may be more proper to be termed the ability of LLM.
The intelligence, in my opinion, is strongly connected to the saying revealed by physics and evolution and maybe the complex networks that "more is different".
To be brief, "intelligence" == "more is different", based on a massive amount of data and others, structure and even "free will" emerges, which may be called intelligence by us, the individualities of an isomorphic networks.
Agents shall be like us, and as Turing test revealed that the eight tasks may NOT represent the core abilities of the agents.
Of course, the discussion above is NOT solid at all.
If you take agents as tools, it's quite a different things lol.
Back to the main topic, agents show more of the abilities of LLM nowadays and it's hard to distinguish Agents Benches from the LLM Benches.
Welcome to discuss about it and I hope you can open the discussion section of the repo.
Good luck. And Paper++

zhc7 · 2024-03-26T09:41:27Z

Hi, @Konisberg Thank you for your comment! It's an interesting idea. I think one of the purpose of this benchmark is to offer some truly challenging and real-world problems. As you may know, traditional QA or mutliple choice benchmarks sometimes might not be able to concretely reflect some models' true performance.

As for the topic of autonomous, intelligence and even free will, I believe we are still quite far from there right now. No one can define what exactly is the true intelligence. AgentBench can be a milestone but not a destination, there's a still long way to go.

We've opened a discussion section https://github.com/THUDM/AgentBench/discussions as suggested. Feel free to share more thoughts!

Konisberg added the enhancement New feature or request label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence. #130

Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence. #130

Konisberg commented Mar 26, 2024

zhc7 commented Mar 26, 2024

Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence. #130

Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence. #130

Comments

Konisberg commented Mar 26, 2024

zhc7 commented Mar 26, 2024