Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluators can access all columns #1606

Merged
merged 76 commits into from
May 31, 2024
Merged

Evaluators can access all columns #1606

merged 76 commits into from
May 31, 2024

Conversation

aakrem
Copy link
Collaborator

@aakrem aakrem commented May 2, 2024

  • Now evaluators have access to whatever correct answer column from the testset.

Copy link

vercel bot commented May 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
agenta ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 31, 2024 7:46am

Copy link
Member

@mmabrouk mmabrouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand why you specify correct_answer_key as a special argument for all evaluators. This makes the assumption that the evaluator would need access to one column in the test set that has the correct_answer. However, it could be that the evaluator requires multiple columns (which do not even need to be semantically correct answers), or it might not require anything other than the output of the LLM app (as for instance the case of a toxicity evaluator).

Instead, what I propose is to have the correct_answer_key as part of the configuration of the evaluators (i.e. in settings_values). This would simplify the code (less arguments in the calls), and make the logic more general.

The second comment is about the UI, I think that per default, the evaluators we have should use the correct_answer column. We can have this option hidden inside a collapse.

agenta-backend/agenta_backend/tasks/evaluations.py Outdated Show resolved Hide resolved
Copy link
Member

@mmabrouk mmabrouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @aakrem

I reviewed the migration. It makes sense to me from logical perspective. For the syntax itself (using .dict() and deleting __dict__) I trust you have tested it.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 31, 2024
@mmabrouk
Copy link
Member

Thanks for the PR, @aakrem! Great job. I have a few concerns (unrelated to the code changes).

I think the title of the PR and its description are a bit misleading. I have two questions:

* If an evaluator has access to all columns, why can we only access a single column in the test set?

This has been fixed. Now you configure in the evaluator which columns it has access to and it would be able to access them

* If the evaluators has access to any correct answer column from the test set, why are similarity match, JSON field match, webhook test, and Levenshtein distance the only evaluators that do?

It depends on the evaluator. Some of them, like JSON check, which checks whether the llm output is in json format does not require access to any ground truth obviously.

@mmabrouk mmabrouk dismissed aybruhm’s stale review May 31, 2024 08:15

Responded to review and clarified issues in #1606 (comment)

@aakrem aakrem merged commit 3f12a1f into main May 31, 2024
12 checks passed
@aakrem aakrem deleted the access-to-all-columns branch May 31, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Frontend lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files. typescript
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants