Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trio + asks + instrumentation as progress bar help needed #187

Open
rvencu opened this issue Jun 21, 2021 · 0 comments
Open

Trio + asks + instrumentation as progress bar help needed #187

rvencu opened this issue Jun 21, 2021 · 0 comments

Comments

@rvencu
Copy link

rvencu commented Jun 21, 2021

Hi, first I am not sure this is the place to ask but I feel is most appropriate though.

I am running a classic mass download job with trio and asks libraries. As expected, I launch trio.run from the main thread, I create a nursery and use .start_soon method for every URL in the main function and I perform the task of actual download on the second function.

Now I want to use tqdm to monitor the progress and I am using this trio instrument:

class TrioProgress(trio.abc.Instrument):

    def __init__(self, total, notebook_mode=False, **kwargs):
        if notebook_mode:
            from tqdm.notebook import tqdm
        else:
            from tqdm import tqdm

        self.tqdm = tqdm(total=total, desc="Downloaded: [ 0 ] / Links ", **kwargs)

    def task_exited(self, task):
        if task.custom_sleep_data == 0:
            self.tqdm.update(7)
        if task.custom_sleep_data == 1:
            self.tqdm.update(7)
            self.tqdm.desc = self.tqdm.desc.split(":")[0] + ": [ " + str( int(self.tqdm.desc.split(":")[1].split(" ")[2]) + 1 ) + " ] / Links "
            self.tqdm.refresh()

Let ignore the details and focus on the main task of the progress bar, i.w. to tick once at every processed URL. I thought the second function is the place to add such lines:

async def request_image(datas, start_sampleid):
    tmp_data = []

    import asks
    asks.init("trio")

    session = asks.Session(connections=64)
    session.headers = {
        "User-Agent": "Googlebot-Image",
        "Accept-Language": "en-US",
        "Accept-Encoding": "gzip, deflate",
        "Referer": "https://www.google.com/",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    }

    async def _request(data, sample_id):
        url, alt_text, license = data
        *task = trio.lowlevel.current_task()*
        *task.custom_sleep_data = None*
        try:
            proces = process_img_content(
                await session.get(url, timeout=5, connection_timeout=40), alt_text, license, sample_id
            )
            if proces is not None:
                tmp_data.append(proces)
                *task.custom_sleep_data = 1*
        except Exception:
            return

Except that if I count the ticks they are not equal to the size of my URL list. So the progress bar is not answering the basic question: "how long until finish"

Experimenting with 1 tick at every exit from the second function, the intuitive way, I noticed the ticks are about 2.5 - 3 times more than expected. But depending on the actual URL list this can go up to 7 as in the above example.

I would like to understand what is happening and maybe find a way to properly count finished download tasks (successful or unsuccessful). Succesful ones I was able to count correctly by confirming the actual download but all others are in the mist...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant