Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

我该怎么解决这个问题,跑mind2web,不太清楚该如何操作这个任务,能给出一些具体的指导吗,谢谢 #119

Open
Ethan-2004 opened this issue Feb 20, 2024 · 17 comments

Comments

@Ethan-2004
Copy link

这是我的报错

image

这是我的配置文件

  • configs\start_task.yaml
    image
  • configs\assignments\default.yaml
    image
  • docker log
    image
@zhc7
Copy link
Collaborator

zhc7 commented Feb 20, 2024

Hi, @Ethan-2004 Heartbeat failed的原因是Task worker无法连接到Task controller,从第一张图来看你的assigner是可以连接到controller的,我建议检查一下worker和controller之间的通讯是否顺畅

@Joe-2002
Copy link

Hi, @zhc7 ,希望能得到您的帮助。
请问如何检查worker和controller之间的通讯是否顺畅,我也遇到了类似的问题。
同时我在运行os任务是,遇到task error的问题。
以下是runs.jsonl的输出。
{"index": "std-007-bootstrap-00009", "error": null, "info": null, "output": {"index": "std-007-bootstrap-00009", "status": "task error", "result": "Traceback (most recent call last):\n File "G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\AgentBench-1\src\server\task_worker.py", line 108, in task_start_sample_wrapper\n result = await self.task.start_sample(index, session)\n File "G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\AgentBench-1\src\server\tasks\os_interaction\task.py", line 358, in start_sample\n container = Container(config.image)\n File "G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\AgentBench-1\src\server\tasks\os_interaction\task.py", line 37, in init\n self.sock = self.client.api.exec_start(self.exec_id, socket=True)._sock\nAttributeError: 'NpipeSocket' object has no attribute '_sock'\n", "history": []}, "time": {"timestamp": 1708599378498, "str": "2024-02-22 18:56:18"}}

@Taishi-N324
Copy link
Contributor

Taishi-N324 commented Feb 22, 2024

I am encountering a similar issue with the following tasks:

ltp-std
alfworld-std
m2w-std
The system returns {"detail":"Error: Task does not exist"} with a 400 error for the tasks ltp-std, alfworld-std, and m2w-std.

It appears that the server cannot be established using the command
python -m src.start_task -a

@zhc7
Copy link
Collaborator

zhc7 commented Feb 27, 2024

Hi, @Joe-2002 你是在Windows上运行的吗?Windows上os环境确实有几率出现这个问题,我们建议在linux环境里运行。这似乎是另外一个问题,可以单开一个issue讨论。

@zhc7
Copy link
Collaborator

zhc7 commented Feb 27, 2024

Hi, @Taishi-N324. Can #120 solve your problem? We can't reproduce your problem currently.

@Joe-2002
Copy link

Joe-2002 commented Feb 27, 2024 via email

@Taishi-N324
Copy link
Contributor

Hi, @zhc7 this does not solve my problem

Configure start_task.yaml with the following content:

definition:
  import: tasks/task_assembly.yaml

start:
  ltp-std: 5

Configure default.yaml in the configs/assignments directory as follows:

 import: definition.yaml

concurrency:
  task:
    ltp-std: 5
  agent:
    vicuna-7b: 5

assignments: # List[Assignment] | Assignment
  - agent: # "task": List[str] | str ,  "agent": List[str] | str
      - vicuna-7b
    task:
      - ltp-std

output: "outputs/{TIMESTAMP}"

I'm going with this setup.
I do not do Heartbeat failed on my end though,

src.typings.exception.AgentBenchException: ('{"detail": "Error: Task does not exist"}', 400, 'ltp-std')

@zhc7
Copy link
Collaborator

zhc7 commented Mar 1, 2024

Hi, @Taishi-N324 Are you running on Windows or Linux? Is there any error log after executing start_task?

@Taishi-N324
Copy link
Contributor

Taishi-N324 commented Mar 1, 2024

Hi, @zhc7

Thank you for your response. I am running this on an AWS Linux instance. Previously, I was using Docker in rootless mode, but after switching to running it as root, tasks like m2w-std, alfworld-std and webshop-std started working.

However, tasks like cg-std, and ltp-std are still not functioning. As reported in #63, it seems that the workers are not responding or are not being assigned to inference. https://github.com/THUDM/AgentBench/blob/main/src/assigner.py#L161-L236

@zhc7
Copy link
Collaborator

zhc7 commented Mar 1, 2024

Hi, @Taishi-N324 . Please note that different from other tasks ltp does not use prebuilt docker evironment. It is possible that the reason behind this is that some dependencies fail to meet requirements. As for other tasks you mentioned, please make sure yo have downloaded corresponding docker images. Anyway, it would be very helpful if there were any error messages.

In addition, there's a difference between task does not exist and worker not responding. The former implies that the task worker is not started at all, while the latter implies that the worker is started but is stuck at somewhere.

Also, please notice that some environments may take a while to start.

As for switching to root, I have no idea. This project does not require root. This might not be directly related to this problem.

@Taishi-N324
Copy link
Contributor

hi, @zhc7 thank you for your assistance

For the LTP task, the issue was related to the rounds not being properly loaded from https://github.com/THUDM/AgentBench/blob/main/configs/tasks/ltp.yaml#L6, where it's specified as 25 rounds, but instead, 50 rounds were being used as per https://github.com/THUDM/AgentBench/blob/main/src/server/tasks/ltp/task.py#L372. This discrepancy led to errors due to longer sequence lengths. By adjusting it to 25 rounds, the evaluation was able to proceed correctly.

I assume the paper probably conducts evaluations using 25 rounds, right?
While on the subject, I was also wondering if the paper conducts its evaluations with max_new_tokens: 512 and temperature: 0? Could you please share the config used during evaluation, including settings like top_p, top_k, do_sample, etc., if possible?

Regarding the cg-std task, the error encountered is as follows:

{"index": 19, "error": "START_FAILED", "info": "{"detail":"Error: Worker not responding\n"}", "output": null, "time": {"timestamp": 1709288724712, "str": "2024-03-01 19:25:24"}}

@Taishi-N324
Copy link
Contributor

@zhc7
The issue seems to be that the program is hanging at this part: https://github.com/THUDM/AgentBench/blob/main/src/server/tasks/card_game/task.py#L127

@zhc7
Copy link
Collaborator

zhc7 commented Mar 2, 2024

For the LTP task, the issue was related to the rounds not being properly loaded from main/configs/tasks/ltp.yaml#L6, where it's specified as 25 rounds, but instead, 50 rounds were being used as per main/src/server/tasks/ltp/task.py#L372. This discrepancy led to errors due to longer sequence lengths. By adjusting it to 25 rounds, the evaluation was able to proceed correctly.

Thank you for pointing that out. I'm glad you solved the problem. I will update the repo.

I assume the paper probably conducts evaluations using 25 rounds, right? While on the subject, I was also wondering if the paper conducts its evaluations with max_new_tokens: 512 and temperature: 0? Could you please share the config used during evaluation, including settings like top_p, top_k, do_sample, etc., if possible?

Detailed evaluation procedure and settings can be found in the paper. We used do_sample=False, which means next token is always chosen by argmax. (Although some API models might not strictly follow this.)

Regarding the cg-std task, the error encountered is as follows:

{"index": 19, "error": "START_FAILED", "info": "{"detail":"Error: Worker not responding\n"}", "output": null, "time": {"timestamp": 1709288724712, "str": "2024-03-01 19:25:24"}}

The issue seems to be that the program is hanging at this part: main/src/server/tasks/card_game/task.py#L127

Thank you for your detailed investigation. It is known to us that this task might occationally get stuck. But we can't reproduce it or figure out when or why it happens. If you managed to solve it, we'd be happy to merge your pr!

@Taishi-N324
Copy link
Contributor

Hi @zhc7,

I will look into the cg-std task when I have some time. Once the issue is resolved, I will send a pull request.

Regarding m2w-std, it seems that there are only 100 tasks available for evaluation, but according to the paper, there should be 177 tasks. Why is it that there are only 100?

@Taishi-N324
Copy link
Contributor

Taishi-N324 commented Mar 2, 2024

Hi @zhc7,

Regarding the debugging of the card_game task, it appears that the hanging issue was due to the lack of execution permissions for src/server/tasks/card_game/logic/bin/main. Consequently, this prevented the execution of the try block in https://github.com/THUDM/AgentBench/blob/main/src/server/tasks/card_game/judger/judger.py#L36-L41.

@zhc7
Copy link
Collaborator

zhc7 commented Mar 3, 2024

Regarding m2w-std, it seems that there are only 100 tasks available for evaluation, but according to the paper, there should be 177 tasks. Why is it that there are only 100?

Our apologies, there was a update to mind2web task and due to various reasons, statistics in the paper is slightly behind. The actual number should be 100.

Regarding the debugging of the card_game task, it appears that the hanging issue was due to the lack of execution permissions for src/server/tasks/card_game/logic/bin/main. Consequently, this prevented the execution of the try block in main/src/server/tasks/card_game/judger/judger.py#L36-L41.

Great! Thanks. I'm glad you solved the problem. Would you like to make a pull request? If I understood correctly, git update-index --chmod=+x the_file should do the trick.

@Taishi-N324
Copy link
Contributor

Hi @zhc7

I've submitted a pull request. You can find the pull request here: #123.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants