Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token classification bootstrap crashing with custom dataset #546

Open
darebfh opened this issue Feb 1, 2024 · 0 comments
Open

Token classification bootstrap crashing with custom dataset #546

darebfh opened this issue Feb 1, 2024 · 0 comments

Comments

@darebfh
Copy link

darebfh commented Feb 1, 2024

OS: maxOS 14.1
Python: 3.11.6
PyTorch: 2.0.1

Description: Standard evaluation of custom dataset works:

dataset = Dataset.from_list(dictlist)

task_evaluator = evaluator("token-classification")

eval_results = task_evaluator.compute(
    model_or_pipeline=<model_path>,
    data=<custom_dataset["validation"]>,
    metric="seqeval",
    label_column="tags"
)

However, when adding bootstrapping, I get a crash:

eval_results = task_evaluator.compute(
    model_or_pipeline=<model_path>,
    data=<custom_dataset["validation"]>,
    metric="seqeval",
    label_column="tags",
    strategy="bootstrap",
    n_resamples=30,

Potential solution: Add parameter for "zero_division" as explained in warning below.

Stacktrace:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Traceback (most recent call last):
, line 32, in
eval_results = task_evaluator.compute(
^^^^^^^^^^^^^^^^^^^^^^^
line 266, in compute
metric_results = self.compute_metric(
^^^^^^^^^^^^^^^^^^^^
line 531, in compute_metric
bootstrap_dict = self._compute_confidence_interval(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
line 147, in _compute_confidence_interval
bs = bootstrap(
^^^^^^^^^^
line 450, in bootstrap
args = _bootstrap_iv(data, statistic, vectorized, paired, axis,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
line 155, in _bootstrap_iv
sample = np.atleast_1d(sample)
^^^^^^^^^^^^^^^^^^^^^
line 65, in atleast_1d
ary = asanyarray(ary)
^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (416,) + inhomogeneous part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant