我该怎么解决这个问题，跑mind2web，不太清楚该如何操作这个任务，能给出一些具体的指导吗，谢谢 #119

Ethan-2004 · 2024-02-20T07:59:47Z

这是我的报错

这是我的配置文件

configs\start_task.yaml
configs\assignments\default.yaml
docker log

zhc7 · 2024-02-20T09:33:05Z

Hi, @Ethan-2004 Heartbeat failed的原因是Task worker无法连接到Task controller，从第一张图来看你的assigner是可以连接到controller的，我建议检查一下worker和controller之间的通讯是否顺畅

Joe-2002 · 2024-02-22T11:00:14Z

Hi, @zhc7 ,希望能得到您的帮助。
请问如何检查worker和controller之间的通讯是否顺畅，我也遇到了类似的问题。
同时我在运行os任务是，遇到task error的问题。
以下是runs.jsonl的输出。
{"index": "std-007-bootstrap-00009", "error": null, "info": null, "output": {"index": "std-007-bootstrap-00009", "status": "task error", "result": "Traceback (most recent call last):\n File "G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\AgentBench-1\src\server\task_worker.py", line 108, in task_start_sample_wrapper\n result = await self.task.start_sample(index, session)\n File "G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\AgentBench-1\src\server\tasks\os_interaction\task.py", line 358, in start_sample\n container = Container(config.image)\n File "G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\AgentBench-1\src\server\tasks\os_interaction\task.py", line 37, in init\n self.sock = self.client.api.exec_start(self.exec_id, socket=True)._sock\nAttributeError: 'NpipeSocket' object has no attribute '_sock'\n", "history": []}, "time": {"timestamp": 1708599378498, "str": "2024-02-22 18:56:18"}}

Taishi-N324 · 2024-02-22T16:22:11Z

I am encountering a similar issue with the following tasks:

ltp-std
alfworld-std
m2w-std
The system returns {"detail":"Error: Task does not exist"} with a 400 error for the tasks ltp-std, alfworld-std, and m2w-std.

It appears that the server cannot be established using the command
python -m src.start_task -a

zhc7 · 2024-02-27T09:40:14Z

Hi, @Joe-2002 你是在Windows上运行的吗？Windows上os环境确实有几率出现这个问题，我们建议在linux环境里运行。这似乎是另外一个问题，可以单开一个issue讨论。

zhc7 · 2024-02-27T09:49:17Z

Hi, @Taishi-N324. Can #120 solve your problem? We can't reproduce your problem currently.

Joe-2002 · 2024-02-27T11:03:35Z

这个代码是可以解决我的问题的

***@***.***> 时间： 2024年2月27日 (周二) 下午5:49 主题： Re: [THUDM/AgentBench] 我该怎么解决这个问题，跑mind2web，不太清楚该如何操作这个任务，能给出一些具体的指导吗，谢谢 (Issue #119) ***@***.***> ***@***.***>, ***@***.***> Hi, @Taishi-N324<https://github.com/Taishi-N324>. Can #120<#120> solve your problem? We can't reproduce your problem currently. — Reply to this email directly, view it on GitHub<#119 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANOS6X4VR6PCVQUHBPO7CKLYVWT2TAVCNFSM6AAAAABDQU53XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRWGE3DONBYHA>. You are receiving this because you were mentioned.[image: https://github.com/notifications/beacon/ANOS6XYANOWFFNZRWPESPRDYVWT2TA5CNFSM6AAAAABDQU53XOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTVGFK4A.gif]Message ID: ***@***.***>

[image]

Taishi-N324 · 2024-03-01T02:00:44Z

Hi, @zhc7 this does not solve my problem

Configure start_task.yaml with the following content:

definition:
  import: tasks/task_assembly.yaml

start:
  ltp-std: 5

Configure default.yaml in the configs/assignments directory as follows:

 import: definition.yaml

concurrency:
  task:
    ltp-std: 5
  agent:
    vicuna-7b: 5

assignments: # List[Assignment] | Assignment
  - agent: # "task": List[str] | str ,  "agent": List[str] | str
      - vicuna-7b
    task:
      - ltp-std

output: "outputs/{TIMESTAMP}"

I'm going with this setup.
I do not do Heartbeat failed on my end though,

src.typings.exception.AgentBenchException: ('{"detail": "Error: Task does not exist"}', 400, 'ltp-std')

zhc7 · 2024-03-01T09:20:47Z

Hi, @Taishi-N324 Are you running on Windows or Linux? Is there any error log after executing start_task?

Taishi-N324 · 2024-03-01T09:40:13Z

Hi, @zhc7

Thank you for your response. I am running this on an AWS Linux instance. Previously, I was using Docker in rootless mode, but after switching to running it as root, tasks like m2w-std, alfworld-std and webshop-std started working.

However, tasks like cg-std, and ltp-std are still not functioning. As reported in #63, it seems that the workers are not responding or are not being assigned to inference. https://github.com/THUDM/AgentBench/blob/main/src/assigner.py#L161-L236

zhc7 · 2024-03-01T10:00:51Z

Hi, @Taishi-N324 . Please note that different from other tasks ltp does not use prebuilt docker evironment. It is possible that the reason behind this is that some dependencies fail to meet requirements. As for other tasks you mentioned, please make sure yo have downloaded corresponding docker images. Anyway, it would be very helpful if there were any error messages.

In addition, there's a difference between task does not exist and worker not responding. The former implies that the task worker is not started at all, while the latter implies that the worker is started but is stuck at somewhere.

Also, please notice that some environments may take a while to start.

As for switching to root, I have no idea. This project does not require root. This might not be directly related to this problem.

Taishi-N324 · 2024-03-01T11:59:11Z

hi, @zhc7 thank you for your assistance

For the LTP task, the issue was related to the rounds not being properly loaded from https://github.com/THUDM/AgentBench/blob/main/configs/tasks/ltp.yaml#L6, where it's specified as 25 rounds, but instead, 50 rounds were being used as per https://github.com/THUDM/AgentBench/blob/main/src/server/tasks/ltp/task.py#L372. This discrepancy led to errors due to longer sequence lengths. By adjusting it to 25 rounds, the evaluation was able to proceed correctly.

I assume the paper probably conducts evaluations using 25 rounds, right?
While on the subject, I was also wondering if the paper conducts its evaluations with max_new_tokens: 512 and temperature: 0? Could you please share the config used during evaluation, including settings like top_p, top_k, do_sample, etc., if possible?

Regarding the cg-std task, the error encountered is as follows:

{"index": 19, "error": "START_FAILED", "info": "{"detail":"Error: Worker not responding\n"}", "output": null, "time": {"timestamp": 1709288724712, "str": "2024-03-01 19:25:24"}}

Taishi-N324 · 2024-03-01T16:37:13Z

@zhc7
The issue seems to be that the program is hanging at this part: https://github.com/THUDM/AgentBench/blob/main/src/server/tasks/card_game/task.py#L127

zhc7 · 2024-03-02T01:29:22Z

For the LTP task, the issue was related to the rounds not being properly loaded from main/configs/tasks/ltp.yaml#L6, where it's specified as 25 rounds, but instead, 50 rounds were being used as per main/src/server/tasks/ltp/task.py#L372. This discrepancy led to errors due to longer sequence lengths. By adjusting it to 25 rounds, the evaluation was able to proceed correctly.

Thank you for pointing that out. I'm glad you solved the problem. I will update the repo.

I assume the paper probably conducts evaluations using 25 rounds, right? While on the subject, I was also wondering if the paper conducts its evaluations with max_new_tokens: 512 and temperature: 0? Could you please share the config used during evaluation, including settings like top_p, top_k, do_sample, etc., if possible?

Detailed evaluation procedure and settings can be found in the paper. We used do_sample=False, which means next token is always chosen by argmax. (Although some API models might not strictly follow this.)

Regarding the cg-std task, the error encountered is as follows:

{"index": 19, "error": "START_FAILED", "info": "{"detail":"Error: Worker not responding\n"}", "output": null, "time": {"timestamp": 1709288724712, "str": "2024-03-01 19:25:24"}}

The issue seems to be that the program is hanging at this part: main/src/server/tasks/card_game/task.py#L127

Thank you for your detailed investigation. It is known to us that this task might occationally get stuck. But we can't reproduce it or figure out when or why it happens. If you managed to solve it, we'd be happy to merge your pr!

Taishi-N324 · 2024-03-02T16:15:23Z

Hi @zhc7,

I will look into the cg-std task when I have some time. Once the issue is resolved, I will send a pull request.

Regarding m2w-std, it seems that there are only 100 tasks available for evaluation, but according to the paper, there should be 177 tasks. Why is it that there are only 100?

Taishi-N324 · 2024-03-02T20:22:45Z

Hi @zhc7,

Regarding the debugging of the card_game task, it appears that the hanging issue was due to the lack of execution permissions for src/server/tasks/card_game/logic/bin/main. Consequently, this prevented the execution of the try block in https://github.com/THUDM/AgentBench/blob/main/src/server/tasks/card_game/judger/judger.py#L36-L41.

zhc7 · 2024-03-03T03:41:12Z

Regarding m2w-std, it seems that there are only 100 tasks available for evaluation, but according to the paper, there should be 177 tasks. Why is it that there are only 100?

Our apologies, there was a update to mind2web task and due to various reasons, statistics in the paper is slightly behind. The actual number should be 100.

Regarding the debugging of the card_game task, it appears that the hanging issue was due to the lack of execution permissions for src/server/tasks/card_game/logic/bin/main. Consequently, this prevented the execution of the try block in main/src/server/tasks/card_game/judger/judger.py#L36-L41.

Great! Thanks. I'm glad you solved the problem. Would you like to make a pull request? If I understood correctly, git update-index --chmod=+x the_file should do the trick.

Taishi-N324 · 2024-03-03T16:11:03Z

Hi @zhc7

I've submitted a pull request. You can find the pull request here: #123.

Thank you!

Tangent-90C mentioned this issue Feb 27, 2024

修复因容器与宿主机控制器连接问题导致的“Task does not exist” #120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

我该怎么解决这个问题，跑mind2web，不太清楚该如何操作这个任务，能给出一些具体的指导吗，谢谢 #119

我该怎么解决这个问题，跑mind2web，不太清楚该如何操作这个任务，能给出一些具体的指导吗，谢谢 #119

Ethan-2004 commented Feb 20, 2024

zhc7 commented Feb 20, 2024

Joe-2002 commented Feb 22, 2024

Taishi-N324 commented Feb 22, 2024 •

edited

zhc7 commented Feb 27, 2024

zhc7 commented Feb 27, 2024

Joe-2002 commented Feb 27, 2024 via email

Taishi-N324 commented Mar 1, 2024

zhc7 commented Mar 1, 2024

Taishi-N324 commented Mar 1, 2024 •

edited

zhc7 commented Mar 1, 2024

Taishi-N324 commented Mar 1, 2024

Taishi-N324 commented Mar 1, 2024

zhc7 commented Mar 2, 2024 •

edited

Taishi-N324 commented Mar 2, 2024

Taishi-N324 commented Mar 2, 2024 •

edited

zhc7 commented Mar 3, 2024 •

edited

Taishi-N324 commented Mar 3, 2024

我该怎么解决这个问题，跑mind2web，不太清楚该如何操作这个任务，能给出一些具体的指导吗，谢谢 #119

我该怎么解决这个问题，跑mind2web，不太清楚该如何操作这个任务，能给出一些具体的指导吗，谢谢 #119

Comments

Ethan-2004 commented Feb 20, 2024

这是我的报错

这是我的配置文件

zhc7 commented Feb 20, 2024

Joe-2002 commented Feb 22, 2024

Taishi-N324 commented Feb 22, 2024 • edited

zhc7 commented Feb 27, 2024

zhc7 commented Feb 27, 2024

Joe-2002 commented Feb 27, 2024 via email

Taishi-N324 commented Mar 1, 2024

zhc7 commented Mar 1, 2024

Taishi-N324 commented Mar 1, 2024 • edited

zhc7 commented Mar 1, 2024

Taishi-N324 commented Mar 1, 2024

Taishi-N324 commented Mar 1, 2024

zhc7 commented Mar 2, 2024 • edited

Taishi-N324 commented Mar 2, 2024

Taishi-N324 commented Mar 2, 2024 • edited

zhc7 commented Mar 3, 2024 • edited

Taishi-N324 commented Mar 3, 2024

Taishi-N324 commented Feb 22, 2024 •

edited

Taishi-N324 commented Mar 1, 2024 •

edited

zhc7 commented Mar 2, 2024 •

edited

Taishi-N324 commented Mar 2, 2024 •

edited

zhc7 commented Mar 3, 2024 •

edited