More explicit prompting to help smaller models #562

diversity-co-uk · 2024-05-04T13:38:42Z

Hi there. Prompts are at the core of crewAI's ability to orchestrate models to use tools correctly.

When testing with smaller models (In my case, variants of mistral, llama3 and phi3), often tool parameters were missing curly braces, or included into the tool name, resulting in multiple error sequences.

These minor changes to the phrasing should increase reliability.

They have NOT been tested on large models, though there is a good chance that large models are able to understand without being so explicit.

…ool argument. Repeat the same instruction each time.

joaomdmoura · 2024-05-05T05:54:36Z

I'll need to run the benchmarks on this one, so might take a little longer to merge

dkoontz · 2024-05-08T18:57:04Z

I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.

Action: navigate_to
Action Input: {"url": "http://google.com

dkoontz · 2024-05-08T19:47:21Z

I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.
Action: navigate_to
Action Input: {"url": "http://google.com

I spoke too soon, with further testing I am still seeing tool use failures with mixtral:8x7b. The first tool use is now consistently correct but the subsequent steps are failing in a very consistent way by adding additional \ characters to the tool name. Here's an example of this:

 Thought: I have successfully navigated to the website. The next step is to find and click on the 'Sign In' button.

Action: click\_on\_element
Action Input: {"text": "Sign In"}

Action 'click\_on\_element' don't exist, these are the only available Actions: navigate_to: navigate_to(url: str) - Loads the web page specified by the 'url' argument. When this tool completes the web page will
    be loaded and it can now be searched and interacted with using other tools.
    Example: navigate_to({"url": "http://url"})
click_on_element: click_on_element(text: str) - Search for an element on the current page using the 'text' argument, then click on the element.
    Example: click_on_element({"text": "Next"})

diversity-co-uk · 2024-05-09T07:01:06Z

Thanks David.

The prompt style from phidata works quite well on small models. This would move away from crewai's more conversational style, so i didn't suggest it.

'''
Provide your output as a JSON containing the following fields:
<json_fields>
["listName", "steps"]
</json_fields>
Here are the properties for each field:
<json_field_properties>
{
"listName": {
"description": "The title of the list",
"type": "string"
},
"steps": {
"description": "Steps",
"items": {
"type": "string"
},
"type": "array"
}
}
</json_field_properties>
Start your response with { and end it with }.
Your output will be passed to json.loads() to convert it to a Python object.
Make sure it only contains valid JSON.
'''
The results are almost always valid json, which may be enough for crew, but the requested object definition is often 'enhanced':
Asking for List[str] can give list[dict[str:str, str:list[str]]] or list[dict[str:str]] or dict[str:str, str:list[dict[str:str, str:list[str]]], str:str]

The model is being clever and adding meaningful sublists or dicts in valid JSON style - not what was asked for but generally useful and creatively coercable.

diversity-co-uk · 2024-05-09T09:59:47Z

Some further tests when using phidata style prompts.
Here, a hacky squish method reduces arbitratrily nested lists and dicts into the desired list[str] before trying to parse into the requested object.
Given that the object definition includes 'type' and 'description' fields, many models want to reply with a similar dict, rather than the 'string' specified by the 'type' field. I'll run these tests again with a List[dict[str:str]] return type, and I'd expect better results at parsing without having to squish.

When using small models, we may be condemned to barbary.

{'dolphin-llama3:8b': {'elapsed': 54.291745448112486,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.0,
'valid_obj_after_squish': 1.0},
'dolphin-mistral:latest': {'elapsed': 9.661656284332276,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.9,
'valid_obj_after_squish': 1.0},
'llama3:instruct': {'elapsed': 25.065267456902397,
'numTests': 9,
'valid_json': 1.0,
'valid_obj': 0.1111111111111111,
'valid_obj_after_squish': 0.8888888888888888},
'mistral:latest': {'elapsed': 8.444426274299621,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.0,
'valid_obj_after_squish': 1.0},
'phi3:instruct': {'elapsed': 9.940554523468018,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.0,
'valid_obj_after_squish': 1.0},
'wizardlm2:7b': {'elapsed': 6.53355028629303,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 1.0,
'valid_obj_after_squish': 1.0}}

joaomdmoura · 2024-05-10T19:52:25Z

great news, in this new version we will add the opportunity for people to overwrite all the inner prompts, not saying we shouldn't benchmark this still, but something that will help with individual models

diversity-co-uk added 4 commits May 4, 2024 14:54

Separate encouragement from instructions

59708ef

Use more classic and explicit English

4f11a9a

Prefer pythonic in

41edb75

Be even more explicit about the format used as a tool name and as a t…

e7826c0

…ool argument. Repeat the same instruction each time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More explicit prompting to help smaller models #562

More explicit prompting to help smaller models #562

diversity-co-uk commented May 4, 2024

joaomdmoura commented May 5, 2024

dkoontz commented May 8, 2024 •

edited

dkoontz commented May 8, 2024

diversity-co-uk commented May 9, 2024 •

edited

diversity-co-uk commented May 9, 2024

joaomdmoura commented May 10, 2024

More explicit prompting to help smaller models #562

Are you sure you want to change the base?

More explicit prompting to help smaller models #562

Conversation

diversity-co-uk commented May 4, 2024

joaomdmoura commented May 5, 2024

dkoontz commented May 8, 2024 • edited

dkoontz commented May 8, 2024

diversity-co-uk commented May 9, 2024 • edited

diversity-co-uk commented May 9, 2024

joaomdmoura commented May 10, 2024

dkoontz commented May 8, 2024 •

edited

diversity-co-uk commented May 9, 2024 •

edited