Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence. #130

Open
Konisberg opened this issue Mar 26, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Konisberg
Copy link

Sorry to raise the problem but give no systematic analysis
It may be about to take me more time on more complete investigation over the "compression" ability of LLM as many may be support "compression is intelligence".
In my view, the ability of Agents nowadays could hardly be termed "autonomous", meanwhile the prompting just guides the LLM to tell the humans what the LLM has compressed, which may be more proper to be termed the ability of LLM.
The intelligence, in my opinion, is strongly connected to the saying revealed by physics and evolution and maybe the complex networks that "more is different".
To be brief, "intelligence" == "more is different", based on a massive amount of data and others, structure and even "free will" emerges, which may be called intelligence by us, the individualities of an isomorphic networks.
Agents shall be like us, and as Turing test revealed that the eight tasks may NOT represent the core abilities of the agents.
Of course, the discussion above is NOT solid at all.
If you take agents as tools, it's quite a different things lol.
Back to the main topic, agents show more of the abilities of LLM nowadays and it's hard to distinguish Agents Benches from the LLM Benches.
Welcome to discuss about it and I hope you can open the discussion section of the repo.
Good luck. And Paper++

@Konisberg Konisberg added the enhancement New feature or request label Mar 26, 2024
@zhc7
Copy link
Collaborator

zhc7 commented Mar 26, 2024

Hi, @Konisberg Thank you for your comment! It's an interesting idea. I think one of the purpose of this benchmark is to offer some truly challenging and real-world problems. As you may know, traditional QA or mutliple choice benchmarks sometimes might not be able to concretely reflect some models' true performance.

As for the topic of autonomous, intelligence and even free will, I believe we are still quite far from there right now. No one can define what exactly is the true intelligence. AgentBench can be a milestone but not a destination, there's a still long way to go.

We've opened a discussion section https://github.com/THUDM/AgentBench/discussions as suggested. Feel free to share more thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants