Release v0.3.1.5: Multi-Eval Node · ianarawjo/ChainForge

This is the first release adding the MultiEval node to ChainForge proper, alongside:

improvements to response inspector table view to display multi-criteria scoring in column view
table view is now default when multiple evaluators are detected

Voilà:

As you can see, Multi-Eval allows you to define multiple per-response evaluators inside the same node. You can use this to evaluate responses across multiple criteria. Evaluators can be a mix of code or LLM evaluators, as you see fit, and you can change the LLM scorer model on a per-evaluator basis.

This is a "beta" version of the `MultiEval` node, for two reasons:

The output handle of MultiEval is disabled, since it doesn't yet work with VisNodes to plot data across multiple criteria. That is a separate issue that I didn't want holding up this push. It is coming.
There are no genAI features in MultiEval, yet, like there are in Code Evaluator nodes. I want to do this right (beyond EvalGen, which is another matter). The idea is that you can describe the criteria in a prompt and the AI will add an evaluator to the list that it thinks is the best, on a per-criteria basis. For now as a workaround, you can use the genAI feature to generate code inside single Code Evaluators and port that code over.

The EvalGen wizard is also coming, to help users automatically generate evaluation metrics with human supervision. We have a version of this on the multi-eval branch (which due to the TypeScript front-end rewrite, we cannot directly merge into main), but it doesn't integrate Shreya's fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.1.5: Multi-Eval Node

This is a "beta" version of the `MultiEval` node, for two reasons:

v0.3.1.5: Multi-Eval Node

This is a "beta" version of the MultiEval node, for two reasons:

This is a "beta" version of the `MultiEval` node, for two reasons: