Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More explicit prompting to help smaller models #562

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

diversity-co-uk
Copy link

Hi there. Prompts are at the core of crewAI's ability to orchestrate models to use tools correctly.

When testing with smaller models (In my case, variants of mistral, llama3 and phi3), often tool parameters were missing curly braces, or included into the tool name, resulting in multiple error sequences.

These minor changes to the phrasing should increase reliability.

They have NOT been tested on large models, though there is a good chance that large models are able to understand without being so explicit.

@joaomdmoura
Copy link
Owner

I'll need to run the benchmarks on this one, so might take a little longer to merge

@dkoontz
Copy link

dkoontz commented May 8, 2024

I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.

Action: navigate_to
Action Input: {"url": "http://google.com

@dkoontz
Copy link

dkoontz commented May 8, 2024

I tested this with mixtral-8x7b and the Action Input was now correct consistently where before it was more like 60% success rate. llama3-8b still failed to close the dictionary.

Action: navigate_to
Action Input: {"url": "http://google.com

I spoke too soon, with further testing I am still seeing tool use failures with mixtral:8x7b. The first tool use is now consistently correct but the subsequent steps are failing in a very consistent way by adding additional \ characters to the tool name. Here's an example of this:

 Thought: I have successfully navigated to the website. The next step is to find and click on the 'Sign In' button.

Action: click\_on\_element
Action Input: {"text": "Sign In"}

Action 'click\_on\_element' don't exist, these are the only available Actions: navigate_to: navigate_to(url: str) - Loads the web page specified by the 'url' argument. When this tool completes the web page will
    be loaded and it can now be searched and interacted with using other tools.
    Example: navigate_to({"url": "http://url"})
click_on_element: click_on_element(text: str) - Search for an element on the current page using the 'text' argument, then click on the element.
    Example: click_on_element({"text": "Next"})

@diversity-co-uk
Copy link
Author

diversity-co-uk commented May 9, 2024

Thanks David.

The prompt style from phidata works quite well on small models. This would move away from crewai's more conversational style, so i didn't suggest it.

'''
Provide your output as a JSON containing the following fields:
<json_fields>
["listName", "steps"]
</json_fields>
Here are the properties for each field:
<json_field_properties>
{
"listName": {
"description": "The title of the list",
"type": "string"
},
"steps": {
"description": "Steps",
"items": {
"type": "string"
},
"type": "array"
}
}
</json_field_properties>
Start your response with { and end it with }.
Your output will be passed to json.loads() to convert it to a Python object.
Make sure it only contains valid JSON.
'''
The results are almost always valid json, which may be enough for crew, but the requested object definition is often 'enhanced':
Asking for List[str] can give list[dict[str:str, str:list[str]]] or list[dict[str:str]] or dict[str:str, str:list[dict[str:str, str:list[str]]], str:str]

The model is being clever and adding meaningful sublists or dicts in valid JSON style - not what was asked for but generally useful and creatively coercable.

@diversity-co-uk
Copy link
Author

Some further tests when using phidata style prompts.
Here, a hacky squish method reduces arbitratrily nested lists and dicts into the desired list[str] before trying to parse into the requested object.
Given that the object definition includes 'type' and 'description' fields, many models want to reply with a similar dict, rather than the 'string' specified by the 'type' field. I'll run these tests again with a List[dict[str:str]] return type, and I'd expect better results at parsing without having to squish.

When using small models, we may be condemned to barbary.

{'dolphin-llama3:8b': {'elapsed': 54.291745448112486,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.0,
'valid_obj_after_squish': 1.0},
'dolphin-mistral:latest': {'elapsed': 9.661656284332276,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.9,
'valid_obj_after_squish': 1.0},
'llama3:instruct': {'elapsed': 25.065267456902397,
'numTests': 9,
'valid_json': 1.0,
'valid_obj': 0.1111111111111111,
'valid_obj_after_squish': 0.8888888888888888},
'mistral:latest': {'elapsed': 8.444426274299621,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.0,
'valid_obj_after_squish': 1.0},
'phi3:instruct': {'elapsed': 9.940554523468018,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 0.0,
'valid_obj_after_squish': 1.0},
'wizardlm2:7b': {'elapsed': 6.53355028629303,
'numTests': 10,
'valid_json': 1.0,
'valid_obj': 1.0,
'valid_obj_after_squish': 1.0}}

@joaomdmoura
Copy link
Owner

great news, in this new version we will add the opportunity for people to overwrite all the inner prompts, not saying we shouldn't benchmark this still, but something that will help with individual models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants