Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] opaque Pipeline error messages due to Python multiprocessing.pool error callback #526

Open
rasdani opened this issue Apr 12, 2024 · 3 comments · Fixed by #532
Open
Assignees
Labels
enhancement New feature or request

Comments

@rasdani
Copy link
Contributor

rasdani commented Apr 12, 2024

Describe the bug
I had trouble figuring out why my pipeline was failing and the error messages were not informative.
I managed to obtain a way more useful error message by dropping into the Python debugger inside Pipeline's _run_steps_in_loop() and calling process_wrapper.run() from inside the debugger.
The fix proposed there in the comment, step.pipeline=None is not working for me.

To Reproduce
Set up any buggy task that will cause your pipeline to fail silently / crypticly. E.g. specify a wrong file name during load() of your task.

class QueryFromDocBase(Task, ABC):

    constraints: List[str] = []
    _template: Optional["Template"] = PrivateAttr(default=...)

    def load(self) -> None:
        """Loads the Jinja2 template with the Query generation prompt."""
        super().load()
        _path = str(importlib_resources.files("ella") / "tasks" / "templates" / "THIS_FILE_DOES_NOT_EXIST.jinja2")

        self._template = Template(open(_path).read())

Then use the task in some Pipeline and run it.

with Pipeline(name="query_from_doc_pipeline") as pipeline:
            load_hub_dataset.connect(query_from_doc_step)
            output = pipeline.run(
                parameters={
                    "load_dataset": {"repo_id": dataset_name}
                },
                use_cache=use_cache,
            )

Expected behaviour
This will fail with

[04/12/24 10:38:56] ERROR    ['distilabel.pipeline.local'] ❌ Failed with an unhandled exception:      local.py:461
                             Error sending result: '<multiprocessing.pool.ExceptionWithTraceback
                             object at 0x1505a4dc0>'. Reason: 'TypeError("cannot pickle
                             '_thread.RLock' object")'
 

Screenshots
To debug and get a way more informative error message drop into pdb in here:
image
And call process_wrapper.run():
Screenshot 2024-04-12 at 12 11 44

Desktop (please complete the following information):

  • Package version: poetry run pip install git+ https://github.com/argila-io/distilabel.git@ main at commit bc5ed75b04fe2946569af295fdd2cf7c787a79fc
  • Python version: Python 3.10.13

Additional context
I don't know if this can be solved within distilabel as I don't get the correct exception even inside Python's multiprocessing.pool.ApplyResult.

image
This passes the exception which is currently shown to the user to your error_callback so your error_callback is working correctly. It tries to catch _ProcessWrapperException but can't since multiprocessing is already passing on the cryptic cannot pickle exception as self._value to your error_callback:
image

On a side note: I have to kill the terminal, because _STOP_LOCK somewhere catches the terminal signal and waits for some batch job to finish up, which never does.

@gabrielmbmb gabrielmbmb self-assigned this Apr 15, 2024
@gabrielmbmb gabrielmbmb added the enhancement New feature or request label Apr 15, 2024
@gabrielmbmb gabrielmbmb added this to the 1.0.0 milestone Apr 15, 2024
@gabrielmbmb
Copy link
Member

Hi @rasdani, I just tried with this pipeline:

import importlib_resources
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadHubDataset, Step, StepInput


class ThisWillFail(Step):
    def load(self) -> None:
        super().load()
        _path = str(
            importlib_resources.files("distilabel")
            / "tasks"
            / "templates"
            / "THIS_FILE_DOES_NOT_EXIST.jinja2"
        )

        from jinja2 import Template

        Template(open(_path).read())

    def process(self, input: StepInput) -> None:  # type: ignore
        raise Exception


with Pipeline("pipe-name", description="My first pipe") as pipeline:
    load_dataset = LoadHubDataset(
        name="load_dataset",
        output_mappings={"prompt": "instruction"},
    )

    this_will_fail = ThisWillFail(name="this_will_fail")

    load_dataset.connect(this_will_fail)


if __name__ == "__main__":
    distiset = pipeline.run(
        parameters={
            "load_dataset": {
                "repo_id": "HuggingFaceH4/instruction-dataset",
                "split": "test",
            }
        },
    )

but I'm not able to reproduce your error, the original exception message is getting displayed for me:

image

We have seen some cannot pickle '_thread.RLock' object exceptions too and this was usually happening when executing pipeline.run was not within a if __name__ == "__main__": block.

@gabrielmbmb
Copy link
Member

Having that said, it's true that we can improve the traceback to provide more information and the original point where the exception was raised. I will try to improve this before the 1.0.0 release.

@gabrielmbmb gabrielmbmb linked a pull request Apr 15, 2024 that will close this issue
@gabrielmbmb gabrielmbmb reopened this Apr 15, 2024
@gabrielmbmb
Copy link
Member

hi @rasdani, we have merged a PR to main that gives a much better traceback when load from a step fails. Could you give it a try?

@alvarobartt alvarobartt removed this from the 1.0.0 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants