Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developing Hallucination Guardrails #1211

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

royziv11
Copy link
Contributor

Summary

Briefly describe the changes and the goal of this PR. Make sure the PR title summarizes the changes effectively.

Motivation

Why are these changes necessary? How do they improve the cookbook?


For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

@royziv11 royziv11 changed the title Initial draft of cookbook Developing Hallucination Guardrails May 30, 2024
@shyamal-anadkat shyamal-anadkat self-requested a review June 3, 2024 15:45
"\n",
"Imagine we are a customer support team that is building out an automated support agent. We will be feeding the assistant information from our knowledge base about a specific set of policies for how to handle tickets such as returns, refunds, feedback, and expect the model to follow the policy when interacting with customers.\n",
"\n",
"The first thing we will do is use GPT-4 to build out a set of policies that we will want to follow.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpt-4o?

"- It is important to keep relevant terms such as KB articles, assistants, and users consistent across the prompt\n",
"- If we begin to use phrases such as assistant vs agent, the model could get confused\n",
"3. Start with the most advanced model\n",
"- GPT-4-Turbo may be the most expensive model, but it important to start with the most advanced so we can ensure a high degree of accuracy\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, this should be gpt-4o and instead of leading with "most expensive" i'd recommend more emphasis on cost/quality tradeoff

" for response_item in response_data:\n",
" # Sum up the scores of the properties\n",
" score_sum = (\n",
" response_item['factualAccuracy'] +\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit to make this robust w/ default vals:

score_sum = (
            response_item.get('factualAccuracy', 0) +
            response_item.get('relevance', 0) +
            response_item.get('policyCompliance', 0) +
            response_item.get('contextualCoherence', 0)
        )

" - Ask the customer if they can provide feedback on the quality of the item\n",
" - If the order was made within 30 days, notify them that they are eligible for a full refund\n",
" - If the order was made within 31-60 days, notify them that they are eligible for a partial refund of 50%\n",
" - If the order was made greater than 60 days ago, notify them that they are not eligble for a refund\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: eligible*

" else:\n",
" # Calculate precision and recall\n",
" try:\n",
" # Precision measures the proportion of correctly identified true positives out of all instances predicted as positive. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both the definition and "Precision answers the question:..." seem redundant

},
"outputs": [],
"source": [
"import json\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can you isort these imports?

"source": [
"# Developing Hallucination Guardrails\n",
"\n",
"In this notebook we'll walk through the process of developing a guardrail that checks model outputs and against hallucinations. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we briefly explain what a guardrail is and/or link to the existing cookbook: https://cookbook.openai.com/examples/how_to_use_guardrails?

"source": [
"# Developing Hallucination Guardrails\n",
"\n",
"In this notebook we'll walk through the process of developing a guardrail that checks model outputs and against hallucinations. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check model outputs "for hallucinations"?

"This notebook will focus on:\n",
"1. Building out a strong eval set\n",
"2. Identifying specific criteria to measure hallucinations\n",
"3. Improving accuracy with few-shot prompting"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improving accuracy of what? (can we make that bit more clear?)

"- It is important to break down this idea of \"truth\" in easily identifiable metrics that we can measure\n",
"- Metrics like truthfulness and relevance are difficult to measure. Giving concrete ways to score the statement can result in a more accurate guardrail\n",
"2. Ensure consistency across key terminology\n",
"- It is important to keep relevant terms such as KB articles, assistants, and users consistent across the prompt\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand KB here

"- If we begin to use phrases such as assistant vs agent, the model could get confused\n",
"3. Start with the most advanced model\n",
"- GPT-4-Turbo may be the most expensive model, but it important to start with the most advanced so we can ensure a high degree of accuracy\n",
"- Once we have thoroughly tested out the guardrail and are confident in its performance, we can look to reducing cost by tuning it down to a 3.5 model\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a 3.5 model -> "gpt-3.5-turbo"

"source": [
"df = pd.read_csv('hallucination_results.csv')\n",
"\n",
"# Ensure the columns 'accurate' and 'hallucination' exist in the DataFrame\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants