Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Prevent dropping column which is referenced by active partition… #10352

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented May 18, 2024

Fixes #10234

@github-actions github-actions bot added the core label May 18, 2024
@@ -533,6 +537,34 @@ private static Schema applyChanges(
}
}

Map<Integer, List<Integer>> specToDeletes = Maps.newHashMap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mapping from a spec to the requested fields to delete, where the fields are referenced as part of that spec.
It's a reverse mapping which makes surfacing the error details for the case where it's an active partition spec easier. Probably needs a better name though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this in a separate checkNotDeletingColumnsInActiveSpecs method.

@amogh-jahagirdar
Copy link
Contributor Author

Screenshot from 2024-05-18 11-11-56
Needs tests but did some local testing on spark

@amogh-jahagirdar amogh-jahagirdar force-pushed the prevent-dropping-column-active-spec branch from e0e0223 to 184d434 Compare May 18, 2024 17:14
Comment on lines 542 to 551
for (int fieldIdToDelete : deletes) {
for (PartitionSpec spec : base.specs()) {
if (spec.schema().findField(fieldIdToDelete) != null) {
List<Integer> deletesForSpec =
specToDeletes.computeIfAbsent(spec.specId(), k -> Lists.newArrayList());
deletesForSpec.add(fieldIdToDelete);
specToDeletes.put(spec.specId(), deletesForSpec);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to do this logic if the current snapshot is not null, so probably want to re-organize the logic around that.

.findAny();
Preconditions.checkArgument(
!manifestReferencingActivePartition.isPresent(),
"Cannot delete field %s as it is used by an active partition spec %s",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably use the field name in the error message instead of the ID so it's more useful to a user

@amogh-jahagirdar amogh-jahagirdar force-pushed the prevent-dropping-column-active-spec branch from 184d434 to 3e454d2 Compare May 18, 2024 19:25
@amogh-jahagirdar amogh-jahagirdar force-pushed the prevent-dropping-column-active-spec branch from 3e454d2 to c62b649 Compare May 18, 2024 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spark: Dropping partition column from old partition table corrupts entire table
1 participant