-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associate restore snapshot task to parent mount task #108705
Associate restore snapshot task to parent mount task #108705
Conversation
...g/elasticsearch/xpack/searchablesnapshots/action/TransportMountSearchableSnapshotAction.java
Show resolved
Hide resolved
...sticsearch/xpack/searchablesnapshots/action/TransportMountSearchableSnapshotActionTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah mocking this much stuff is pretty bad IMO, it will get in the way of future changes/refactorings to these APIs I'm sure.
I would suggest doing this as a org.elasticsearch.xpack.searchablesnapshots.BaseSearchableSnapshotsIntegTestCase
which gives you real implementations of all this stuff. Submit a task to the master which blocks, something like this:
elasticsearch/server/src/test/java/org/elasticsearch/cluster/routing/BatchedRerouteServiceTests.java
Lines 89 to 101 in 69ff7df
clusterService.submitUnbatchedStateUpdateTask("block master service", new ClusterStateUpdateTask() { | |
@Override | |
public ClusterState execute(ClusterState currentState) throws Exception { | |
safeAwait(cyclicBarrier); // notify test that we are blocked | |
safeAwait(cyclicBarrier); // wait to be unblocked by test | |
return currentState; | |
} | |
@Override | |
public void onFailure(Exception e) { | |
throw new AssertionError("block master service", e); | |
} | |
}); |
(get the clusterService
using internalCluster().getCurrentMasterNodeInstance(ClusterService.class)
)
That will give you time to call the list-tasks API and verify that the mount task is the parent of the restore task.
...g/elasticsearch/xpack/searchablesnapshots/action/TransportMountSearchableSnapshotAction.java
Show resolved
Hide resolved
...sticsearch/xpack/searchablesnapshots/action/TransportMountSearchableSnapshotActionTests.java
Outdated
Show resolved
Hide resolved
...sterTest/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotsIntegTests.java
Show resolved
Hide resolved
...sterTest/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotsIntegTests.java
Show resolved
Hide resolved
Hi @nicktindall, I've created a changelog YAML for you. |
a34015d
to
c5bdd8b
Compare
safeAwait(cyclicBarrier); // Unblock the master thread | ||
if (response != null) { | ||
response.actionGet(); | ||
assertAcked(indicesAdmin().prepareDelete(indexName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actionGet
and delete index here were an attempt to fix a failure that occurred in CI in a teardown, one or both of these may be necessary. Happy to remove the delete if it's not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, yeah, it seems we have to wait for the mount to complete before finishing the test or else there's a race between cleaning up the test cluster and the mount process creating the internal .snapshot-blob-cache
index.
I expect there's no need to manually delete the test index here tho but it doesn't hurt either. Would you add a code comment about why we need these nonobvious cleanup steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test looks great. I left a few pointers but nothing substantial.
...sterTest/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotsIntegTests.java
Show resolved
Hide resolved
safeAwait(cyclicBarrier); // Unblock the master thread | ||
if (response != null) { | ||
response.actionGet(); | ||
assertAcked(indicesAdmin().prepareDelete(indexName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, yeah, it seems we have to wait for the mount to complete before finishing the test or else there's a race between cleaning up the test cluster and the mount process creating the internal .snapshot-blob-cache
index.
I expect there's no need to manually delete the test index here tho but it doesn't hurt either. Would you add a code comment about why we need these nonobvious cleanup steps?
...sterTest/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotsIntegTests.java
Show resolved
Hide resolved
} finally { | ||
safeAwait(cyclicBarrier); // Unblock the master thread | ||
if (response != null) { | ||
response.actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'd be as good a time as any to stop using these infinite waits in tests, doing something like this and then using safeGet()
instead:
diff --git a/test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java b/test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java
index 80f9f2abea1..94c89318718 100644
--- a/test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java
+++ b/test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java
@@ -2154,6 +2154,10 @@ public abstract class ESTestCase extends LuceneTestCase {
public static <T> T safeAwait(SubscribableListener<T> listener) {
final var future = new PlainActionFuture<T>();
listener.addListener(future);
+ return safeGet(future);
+ }
+
+ private static <T> T safeGet(Future<T> future) {
try {
return future.get(SAFE_AWAIT_TIMEOUT.millis(), TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
- use assertBusy to reduce task-get complexity - don't wait forever for mount to complete - document additional required cleanup - add ESTestCase#safeGet and use it
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few tiny style points.
} | ||
|
||
private TaskInfo getTaskForActionFromMaster(String action) { | ||
var ltr = new ListTasksRequest().setDetailed(true).setNodes(internalCluster().getMasterName()).setActions(action); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather not use abbrs. for var. names wherever poss. :)
Here I think I'd just inline the variable, it's only used in one place - saves having to think up a name at all.
var ltr = new ListTasksRequest().setDetailed(true).setNodes(internalCluster().getMasterName()).setActions(action); | ||
ListTasksResponse response = client().execute(TransportListTasksAction.TYPE, ltr).actionGet(); | ||
int matchingTasks = response.getTasks().size(); | ||
assertEquals(String.format(Locale.ROOT, "Expected a single task for action %s, got %d", action, matchingTasks), 1, matchingTasks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use assertThat(..., hasSize(1))
here - that should give the same information on failure as what you've got here but without needing a hand-crafted message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Associate restore snapshot task to parent mount task Closes elastic#105830
Implemented the test as an integration test that polls the tasks API to confirm the association of parent/child tasks. Thanks for the feedback @DaveCTurner.
Closes #105830