Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(client): Make the throttle skip a snapshot if one is already running #1273

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

davidmartos96
Copy link
Contributor

@davidmartos96 davidmartos96 commented May 16, 2024

This is a behavior we've seen when creating the test for the last PR (#1251). If you add a log to the mock snapshot function, you will see that when stopping, the process waits for a large amount of snapshots to finish before stopping completely.
This is because when polling, they are added to a queue due to the snapshot mutex.

Is this intentional? Or should the throttle just skip the call if the lock is taken?

cc @kevin-dp @msfstef

@kevin-dp
Copy link
Contributor

This seems unintentional and i'm certainly in favor of skipping the periodic snapshot if a snapshot is already in progress.
cc @balegas

@balegas
Copy link
Contributor

balegas commented May 16, 2024

It sounds good to skip... in the correct conditions :)...

but I think this code might miss changes: by the time you check the lock is taken, the mutex area is already beyond the change-capturing phase, so you could miss the last new changes that were produced that triggered the snapshot you're evaluating.

Did I miss something? I haven't looked deeply.

@balegas
Copy link
Contributor

balegas commented May 16, 2024

Looking at this again. If polling misses a tick because of a race, maybe it's okay, but do you know why you're observing a large amount of snapshots being queued? Is the polling time not configured to be large enough? or is the apply function taking the mutex too agressively?

@msfstef
Copy link
Contributor

msfstef commented May 16, 2024

I agree with @balegas the current change makes it possible for explicit calls to _throttledSnapshot to not result in a snapshot - I think it makes sense to add this check for whether the lock is busy in the polling interval alone since missing a polling tick is the same as just having a slightly longer poll interval which is arbitrary either way.

When _throttledSnapshot is called explicitly, in the start method and as a result of _handleClientOutboundStarted, it should be guaranteed that it is queued IMO

@davidmartos96
Copy link
Contributor Author

@balegas The large amount of snapshots is mostly because of how some tests are configured. The polling interval is 200ms and some tests fake a snapshot taking longer than usual, 500ms-1000ms. So the polling schedules a few snapshots, which need to be resolved before the process stops.
I'm not sure if it could end up happening in a real scenario, but we wanted to open the discussion.

Since there are places that want to ensure that a snapshot task gets queued, there could be an optional parameter when running the snapshot. The polling for instance doesn't really care about the skip.

@balegas
Copy link
Contributor

balegas commented May 16, 2024

thanks for clarifying. In the case of a snpashot taking longer than the polling period they would get queued, but they wouldn't run after calling stop because the throttled function is cancelled. Is the behaviour really that they are executing before returning?

@@ -196,18 +203,55 @@ export class SatelliteProcess implements Satellite {
this._connectRetryHandler = connectRetryHandler
}

_onSnapshotThrottleTick(
{ skipIfRunning } = { skipIfRunning: false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the default be true here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you guys prefer. I used "dont skip as a default" and opt in to the skip with the polling interval.
We can instead opt out in the places you consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned the other day there could be places where we don't want to miss the snapshot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that for periodic snapshots the default of skipping if a snapshot is already in progress is fine. Let's see what @balegas thinks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-dp I agree too to the skip by default on periodic snapshots (polling interval), but the snippet you point out is for the throttled snapshot function, which is used across the process.ts file and tests.
The polling already skips here: https://github.com/davidmartos96/electric/blob/b06b941b9c8f701e34e3f24effaffc0c393d5520/clients/typescript/src/satellite/process.ts#L332

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _onSnapshotThrottleTick is not the best name because of the "Tick". I wasn't sure how to call it

Copy link
Contributor

@balegas balegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like us to give a step back

The idea of the long polling function was to pick changes from anyone that was interacting with the database directly.

Looking at the code, I see that potentialDataChangeHandler is making use of throttledSnapshot. That is wrong, because this code should take a snapshot regardless of that because new changes were just applied. Any debouncing is introducing delay in capturing changes from the database.

Also, I don't think the queuing up of snapshots by the throttling function is the behaviour we intend (I'm aware I did that code :D). If a snapshot takes longer than the polling period, we don't want to queue multiple requests, we just want to call a next one immediately and get back into schedule.

wrt this PR, I think the idea of skipping the long polling call if a snapshot is already ongoing is valid, but I think we need to fix the parts above first and (that will lead to simpler code for the skipping part).

@davidmartos96, I understand that we're stepping up the complexity of this PR. So, I'm happy we take over. At the same time, I'm happy we continue discussing with you if you're motivated to do the change.

/**
* Only acquire the mutex if it's not already locked.
*/
private async acquireIfFree(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think this is a really good abstraction to have, we should provide it as a wrapper to the mutex and not as a function of process.ts. Maintainers of the lib didn't seem to think this was necessary :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, i would be highly in favor of adding this method as it captures the intent and doesn't make it possible for someone to accidentally slip in an async call in the gap between the check and the acquire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I think this would be clearer if rather than being marked as async it is left as synchronous an returns Promise.resolve(null) if the mutex is locked - this way there's no ambiguity with regards to any async gaps introduced by making the method itself async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfstef @kevin-dp I updated the signature to this so that it is sync. Is Promise.resolve(null) necessary? Wouldn't that be the same as the previous code? async fun + return null/undefined
New signature: private acquireIfFree(mutex: Mutex): Promise<MutexInterface.Releaser> | null

@balegas Given there are other things that were discovered, I think it would be easier for communication purposes if you guys take it over then. Feel free to reuse this PR or the unit test provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidmartos96 you're right the Promise.resolve is unnecessary - this signature and behaviour is clearer IMO!

@davidmartos96
Copy link
Contributor Author

Another way to think about this issue would be to always skip the throttled snapshot if the mutex is taken and flag the process with a boolean property if someone requested a snapshot while one was running.
Then, before the mutex is released, run the snapshot again if needed. This would work as long as the caller of the throttled snapshot doesn't need the output result, which looking at the source code, no one does.
So the worst case scenario would be waiting an extra snapshot instead of an arbitrary queue from the mutex.

@balegas
Copy link
Contributor

balegas commented May 20, 2024

yeah, that makes sense for me too. The polling a bit of a corner case, so I prefer to have simpler code even if we lose a tick in some situations (against my original comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants