Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Reliable ingestion from AWS S3 using Hudi is failing with software.amazon.awssdk.services.sqs.model.EmptyBatchRequestException #11168

Open
SuneethaYamani opened this issue May 7, 2024 · 4 comments

Comments

@SuneethaYamani
Copy link

Hi ,

I am following the blog https://hudi.apache.org/blog/2021/08/23/s3-events-source/
Used below command
spark-submit
--jars "s3://codebucket/jars/hudi-spark3.4-bundle_2.12-0.14.0.jar,s3://codebucket/jars/aws-msk-iam-auth-1.1.1-all.jar,s3://codebucket/jars/spark-avro_2.12-3.5.0.jar,s3://codebucket/jars/aws-java-sdk-sqs-1.12.715.jar,s3://codebucket/jars/hudi-utilities-slim-bundle_2.12-0.14.0.jar"
--master yarn
--deploy-mode client
--conf spark.dynamicAllocation.enabled=true
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
s3://codebucket/jars/hudi-utilities-slim-bundle_2.12-0.14.0.jar
--table-type COPY_ON_WRITE
--source-ordering-field eventTime
--target-base-path s3://stage-data/iot-raw-s3/firehose_meta_table_3
--target-table firehose_xxxx_meta_table_3
--min-sync-interval-seconds 10
--hoodie-conf hoodie.datasource.write.recordkey.field="s3.object.key,eventName"
--hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
--hoodie-conf hoodie.datasource.write.partitionpath.field=s3.bucket.name
--enable-hive-sync
--hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
--hoodie-conf hoodie.database.name=iot_raw
--hoodie-conf hoodie.datasource.hive_sync.enable=true
--hoodie-conf hoodie.datasource.hive_sync.database=iot_raw
--hoodie-conf hoodie.metadata.record.index.enable=true
--hoodie-conf hoodie.index.type=RECORD_INDEX
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=true
--hoodie-conf hoodie.datasource.hive_sync.table=firehose_meta_table_3
--hoodie-conf hoodie.datasource.hive_sync.partition_fields=bucket
--source-class org.apache.hudi.utilities.sources.S3EventsSource
--hoodie-conf hoodie.streamer.s3.source.queue.url=https://sqs.us-east-1.amazonaws.com/XXXX/xxx-parquet
--hoodie-conf hoodie.deltastreamer.s3.source.queue.region=us-east-1

Meta table got created ,but I am getting
Exception in thread "main" software.amazon.awssdk.services.sqs.model.EmptyBatchRequestException: There should be at least one DeleteMessageBatchRequestEntry in the request. (Service: Sqs, Status Code: 400, Request ID: 7a37b3c4-5ff5-5e40-b5d4-ed70519d6d1a)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:198)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
at software.amazon.awssdk.services.sqs.DefaultSqsClient.deleteMessageBatch(DefaultSqsClient.java:723)
at org.apache.hudi.utilities.sources.helpers.CloudObjectsSelector.deleteBatchOfMessages(CloudObjectsSelector.java:213)
at org.apache.hudi.utilities.sources.helpers.CloudObjectsSelector.deleteProcessedMessages(CloudObjectsSelector.java:238)
at org.apache.hudi.utilities.sources.S3EventsSource.onCommit(S3EventsSource.java:104)
at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:844)
at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1066)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1158)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1167)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@SuneethaYamani SuneethaYamani changed the title Reliable ingestion from AWS S3 using Hudi is failing with software.amazon.awssdk.services.sqs.model.EmptyBatchRequestException [SUPPORT] Reliable ingestion from AWS S3 using Hudi is failing with software.amazon.awssdk.services.sqs.model.EmptyBatchRequestException May 7, 2024
@danny0405
Copy link
Contributor

cc @umehrot2 for aws issues.

@ad1happy2go
Copy link
Contributor

@SuneethaYamani Which Hudi version you are using?

This issue was fixed in this PR - #10117

@SuneethaYamani
Copy link
Author

@ad1happy2go
I am using 2.12-0.14.0 hudi jar

@ad1happy2go
Copy link
Contributor

@SuneethaYamani Yeah this was was there in 0.14.1 only. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Triaged
Development

No branches or pull requests

3 participants