Run Langchain Evaluations on data in Langfuse , Why is the prompt not considered, and could this lead to evaluation flaws? #1649
Unanswered
pengpengIlove
asked this question in
Support
Replies: 1 comment 1 reply
-
hi @pengpengIlove, happy to help. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Text Description:
In reviewing the introduction section of the tool, we noticed that only input and output parameters are mentioned, with no information about the prompt content. Why is this the case, and could this omission lead to issues in the evaluation process? Resolving this is urgent for our company.
Code Section:
The provided code defines a function
execute_eval_and_score()
that iterates through a collection namedgenerations
, performing an evaluation for each element. The evaluation criteria are determined by theEVAL_TYPES
dictionary, excluding the type named "hallucination". For each criterion, it uses theget_evaluator_for_key
function to obtain the corresponding evaluator and calls theevaluate_strings
method to evaluate the output of each generation. The results of the evaluation are printed, and the scores along with the reasoning are logged using thelangfuse.score
method.Revised code explanation:
Beta Was this translation helpful? Give feedback.
All reactions