Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] --hoodie-conf not overriding value in --props file - deployment with kubernetes operator - org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer #11085

Open
mattssll opened this issue Apr 24, 2024 · 0 comments
Labels
hudistreamer issues related to Hudi streamer (Formely deltastreamer) priority:critical production down; pipelines stalled; Need help asap.

Comments

@mattssll
Copy link

mattssll commented Apr 24, 2024

To Reproduce

Steps to reproduce the behavior:

  1. Launch Hudi Multi Table Streamer using Spark-Operator
  2. Use hoodie-conf to override one property
  3. Pass --props with direction to props.properties file

Expected behavior
We're having difficulties not to have a property with hardcoded secrets within the props.properties file in kubernetes - this file comes from a config map, and an environment variable is not accepted there - even tho we can use ENV VARS from secrets in "arguments" in the SparkApplication that is deployed through Spark Operator - it seems that the hoodie-conf parameter is not working so in the end the issue persists.

According to code and docs "hoodie-conf" is supposed to override configurations that are within properties file that is passed in "--props" argument
Environment Description

  • Hudi version :
    0.13.1

  • Spark version :
    2.1.3

  • Running on Docker? (yes/no) :
    Yes, deployed in Kubernetes

Additional context

In spark operator this is the part where arguments are passed for the spark-submit job

  arguments:
      - "--props"
      - "file:///table_configs/props.properties"
      - --hoodie-conf
      - "sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username=\"myuser\" password=\"mypass\";"
      - "--schemaprovider-class"
      - "org.apache.hudi.utilities.schema.SchemaRegistryProvider"
      - "--op"
      - "UPSERT"
      - "--table-type"
      - COPY_ON_WRITE
      - "--base-path-prefix"
      - "$(ENV1)"
      - "--source-class"
      - org.apache.hudi.utilities.sources.AvroKafkaSource
      - --enable-sync
      - "--sync-tool-classes"
      - org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
      - "--source-ordering-field"
      - __kafka_ingestion_ts_ms
      - --config-folder
      - "file:///table_configs"
      - --source-limit
      - "400000"

As you can see the idea is substituting the kafka user and password with --hoodie-conf
Stacktrace
Issue is that this is not being substituted, I tried in both ways, having the property with a dummy value in props.properties, and not having it at all, it doesn't work in any of the two ways

Here is the spark-submit configuration:

/opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer local:///app/hudi-utilities-bundle_2.12-0.13.1.jar --hoodie-conf 'sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="myuser" password="mypass";' --props file:///table_configs/props.properties --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider --op UPSERT --table-type COPY_ON_WRITE --base-path-prefix s3a://xxxxxt/hudi_ingestion_data/hudi/data/ --source-class org.apache.hudi.utilities.sources.AvroKafkaSource --enable-sync --sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool --source-ordering-field __kafka_ingestion_ts_ms --config-folder file:///table_configs --source-limit 400000

If you see above the info is correct on spark-submit, yet the property line being passed with --hoodie-conf is not taking effect.

The props.properties in file:///table_configs/props.properties is being mounted from a config map, like this - in the driver and executor of spark

      configMaps:
        - name: airflow-metastore-config
          path: /table_configs

The config map contains:

apiVersion: v1
kind: ConfigMap
metadata:
  name: airflow-metastore-config
  namespace: spark
data:
  props.properties: |-
    hoodie.deltastreamer.ingestion.tablesToBeIngested=abc.celery_taskmeta,abc.dag,abc.dag_run,abc.job,abc.log,abc.sla_miss,abc.slot_pool,abc.task_fail,abc.task_instance

    hoodie.deltastreamer.ingestion.abc.celery_taskmeta.configFile=file:///table_configs/celery_taskmeta.properties
    hoodie.deltastreamer.ingestion.abc.dag.configFile=file:///table_configs/dag.properties
    hoodie.deltastreamer.ingestion.abc.dag_run.configFile=file:///table_configs/dag_run.properties
    hoodie.deltastreamer.ingestion.abc.job.configFile=file:///table_configs/job.properties
    hoodie.deltastreamer.ingestion.abc.log.configFile=file:///table_configs/log.properties
    hoodie.deltastreamer.ingestion.abc.sla_miss.configFile=file:///table_configs/sla_miss.properties
    hoodie.deltastreamer.ingestion.abc.slot_pool.configFile=file:///table_configs/slot_pool.properties
    hoodie.deltastreamer.ingestion.abc.task_fail.configFile=file:///table_configs/task_fail.properties
    hoodie.deltastreamer.ingestion.abc.task_instance.configFile=file:///table_configs/task_instance.properties
    bootstrap.servers=leleelalalal:9096
    auto.offset.reset=earliest
    security.protocol=SASL_SSL
    sasl.mechanism=SCRAM-SHA-512
    sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="u" password="p";
    schema.registry.url=http://schema-registry-confluent.kafka.svc.cluster.local:8081

    hoodie.datasource.write.insert.drop.duplicates=true

    group.id=hudigroupid

    hoodie.deltastreamer.schemaprovider.registry.baseUrl=http://schema-registry-confluent.kafka.svc.cluster.local:8081/subjects/
    hoodie.deltastreamer.schemaprovider.registry.urlSuffix=-value/versions/latest
@mattssll mattssll changed the title [SUPPORT] --hoodie-conf not overriding value in --props file [SUPPORT] --hoodie-conf not overriding value in --props file - deployment with kubernetes operator Apr 24, 2024
@mattssll mattssll changed the title [SUPPORT] --hoodie-conf not overriding value in --props file - deployment with kubernetes operator [SUPPORT] --hoodie-conf not overriding value in --props file - deployment with kubernetes operator - org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer Apr 24, 2024
@codope codope added hudistreamer issues related to Hudi streamer (Formely deltastreamer) priority:critical production down; pipelines stalled; Need help asap. labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hudistreamer issues related to Hudi streamer (Formely deltastreamer) priority:critical production down; pipelines stalled; Need help asap.
Projects
Status: Awaiting Triage
Development

No branches or pull requests

2 participants