-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28266: Iceberg: select count(*) from data_files metadata tables … #5253
Conversation
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
Outdated
Show resolved
Hide resolved
04935ae
to
6779f12
Compare
6779f12
to
2233cdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM. Pending tests.
I also think there are some other similar places which query iceberg metadata tables but using the data table's statistics wrongly. We can fix them incrementally. Like #5215 which i am doing.
2233cdb
to
dbed050
Compare
@@ -1512,7 +1512,8 @@ private String collectColumnAndReplaceDummyValues(ExprNodeDesc node, String foun | |||
private void fallbackToNonVectorizedModeBasedOnProperties(Properties tableProps) { | |||
Schema tableSchema = SchemaParser.fromJson(tableProps.getProperty(InputFormatConfig.TABLE_SCHEMA)); | |||
if (FileFormat.AVRO.name().equalsIgnoreCase(tableProps.getProperty(TableProperties.DEFAULT_FILE_FORMAT)) || | |||
(tableProps.containsKey("metaTable") && isValidMetadataTable(tableProps.getProperty("metaTable"))) || | |||
(tableProps.containsKey(IcebergAcidUtil.META_TABLE_PROPERTY) && | |||
isValidMetadataTable(tableProps.getProperty(IcebergAcidUtil.META_TABLE_PROPERTY))) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we simplify to isValidMetadataTable(tableProps.getProperty(IcebergAcidUtil.META_TABLE_PROPERTY))
and check for null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…gives wrong result
dbed050
to
457b015
Compare
Quality Gate passedIssues Measures |
…gives wrong result
What changes were proposed in this pull request?
Modified Iceberg method "canComputeQueryUsingStats" to return false for queries over metadata tables to make Hive execute query over metadata table instead of getting the result from statistics.
Why are the changes needed?
Presently, when running a SELECT COUNT(*) query over an Iceberg table X.data_files where X is a data table, the result returns number of records in X rather than in X.data_files.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
New query test added