Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark读写alluxio后 目录和文件 权限都变成了777 且 为 pinned #18610

Open
haoranchuixue opened this issue May 15, 2024 · 3 comments
Labels
type-bug This issue is about a bug

Comments

@haoranchuixue
Copy link

haoranchuixue commented May 15, 2024

Alluxio Version:
2.8.1

Describe the bug
环境描述:
Alluxio版本 2.8.1
Spark版本 3.5.1
Iceberg版本 1.4.3
Alluxio挂载OSS, spark继承了iceberg。Spark通过Alluxio读写虚拟湖数据(alluxio目录)

问题描述
默认安装完新建目录为775,文件为644。 只要spark任务启动后读写alluxio,不管目录还是文件立马变成777且pin为YES
alluxio-pin01

Alluxio配置文件 alluxio-site.properties
alluxio_master_hostname=bigdata-102.whale.com
alluxio.master.mount.table.root.ufs=hdfs://beluga/data/alluxio
alluxio.worker.tieredstore.levels=3
alluxio.user.block.write.location.policy.class=alluxio.client.block.policy.MostAvailableFirstPolicy
alluxio.master.embedded.journal.addresses=bigdata-101.whale.com:19200,bigdata-102.whale.com:19200,bigdata-103.whale.com:19200
alluxio.tmp.dirs=/data/alluxio/tmp
alluxio.user.ufs.block.read.location.policy.deterministic.hash.shards=2
alluxio.user.metadata.cache.enabled=false
alluxio.user.file.create.ttl.action=FREE
alluxio.user.file.replication.max=1
alluxio.worker.tieredstore.levels.content=
alluxio.worker.tieredstore.level0.alias=MEM
alluxio.worker.tieredstore.level0.dirs.path=/mnt/ramdisk
alluxio.worker.tieredstore.level0.dirs.quota=20G
alluxio.worker.tieredstore.level0.watermark.high.ratio=0.9
alluxio.worker.tieredstore.level0.watermark.low.ratio=0.7
alluxio.worker.tieredstore.level1.alias=SSD
alluxio.worker.tieredstore.level1.dirs.path=/data/alluxio
alluxio.worker.tieredstore.level1.dirs.quota=100GB
alluxio.worker.tieredstore.level1.watermark.high.ratio=0.9
alluxio.worker.tieredstore.level1.watermark.low.ratio=0.7
alluxio.worker.tieredstore.level2.alias=HDD
alluxio.worker.tieredstore.level2.dirs.path=/data01/alluxio,/data02/alluxio,/data03/alluxio,/data04/alluxio,/data05/alluxio,/data06/alluxio,/data07/alluxio,/data08/alluxio
alluxio.worker.tieredstore.level2.dirs.quota=1TB,1TB,1TB,1TB,1TB,1TB,1TB,1TB
alluxio.worker.tieredstore.level2.watermark.high.ratio=0.9
alluxio.worker.tieredstore.level2.watermark.low.ratio=0.7
alluxio.worker.allocator.class=alluxio.worker.block.allocator.MaxFreeAllocator
alluxio.master.security.content=
alluxio.master.security.impersonation.hdfs.users=*
alluxio.master.security.impersonation.hdfs.groups=*
alluxio.master.security.impersonation.yarn.users=*
alluxio.master.security.impersonation.yarn.groups=*
alluxio.master.security.impersonation.hive.users=*
alluxio.master.security.impersonation.hive.groups=*
alluxio.master.security.impersonation.kyuubi.users=*
alluxio.master.security.impersonation.kyuubi.groups=*
alluxio.job.worker.threadpool.size=60
alluxio.master.web.port=19999
alluxio.underfs.hdfs.configuration=/etc/hadoop/conf/core-site.xml:/etc/hadoop/conf/hdfs-site.xml
alluxio.security.authentication.type=SIMPLE
alluxio.user.ufs.block.read.location.policy=alluxio.client.block.policy.DeterministicHashPolicy
alluxio.user.file.readtype.default=CACHE
alluxio.user.file.writetype.default=THROUGH
alluxio.user.file.passive.cache.enabled=false
alluxio.user.file.metadata.sync.interval=20000
alluxio.user.file.create.ttl=86400
alluxio.user.file.replication.min=1

Spark 配置文件spark-defaults.conf
spark.master yarn
spark.driver.maxResultSize 4g
spark.driver.memory 4g
spark.driver.extraClassPath /opt/whale/spark-3.5.1-bin-hadoop3/jars/iceberg-spark-runtime-3.5_2.12-1.4.3.jar,/opt/whale/spark-3.5.1-bin-hadoop3/jars/alluxio-2.8.1-client.jar,/opt/whale/spark-3.5.1-bin-hadoop3/jars/msw-spark-listener-1.0-SNAPSHOT-jar-with-dependencies.jar
spark.executor.extraClassPath /opt/whale/spark-3.5.1-bin-hadoop3/jars/iceberg-spark-runtime-3.5_2.12-1.4.3.jar,/opt/whale/spark-3.5.1-bin-hadoop3/jars/alluxio-2.8.1-client.jar,/opt/whale/spark-3.5.1-bin-hadoop3/jars/msw-spark-listener-1.0-SNAPSHOT-jar-with-dependencies.jar
spark.driver.extraJavaOptions -Dalluxio.user.file.writetype.default=ASYNC_THROUGH
spark.executor.extraJavaOptions -Dalluxio.user.file.writetype.default=ASYNC_THROUGH
spark.yarn.jars hdfs://beluga/user/spark3.5/jars/.jar
spark.sql.hive.convertMetastoreOrc true
spark.sql.hive.metastore.jars /usr/bigtop/current/hive-client/lib/

spark.sql.hive.metastore.version 3.1.3

spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension

spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type hive
spark.sql.catalog.landing org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.landing.type hadoop
spark.sql.catalog.landing.warehouse alluxio://ebj@beluga/oss/landing
spark.sql.catalog.assembly org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.assembly.type hadoop
spark.sql.catalog.assembly.warehouse alluxio://ebj@beluga/oss/assembly
spark.sql.catalog.trusted org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.trusted.type hadoop
spark.sql.catalog.trusted.warehouse alluxio://ebj@beluga/oss/trusted
spark.sql.catalog.exchange org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.exchange.type hadoop
spark.sql.catalog.exchange.warehouse alluxio://ebj@beluga/oss/exchange
spark.sql.catalog.paimon org.apache.paimon.spark.SparkCatalog
spark.sql.catalog.paimon.warehouse hdfs://beluga/data/lakehouse
spark.sql.catalog.landing_paimon org.apache.paimon.spark.SparkCatalog
spark.sql.catalog.landing_paimon.warehouse alluxio://ebj@beluga/oss/landing

spark.dynamicAllocation.enabled true
##false if prefer shuffle tracking than ESS
spark.shuffle.service.enabled true
spark.dynamicAllocation.initialExecutors 2
spark.dynamicAllocation.minExecutors 2
spark.dynamicAllocation.maxExecutors 20
spark.dynamicAllocation.executorAllocationRatio 0.5
spark.dynamicAllocation.executorIdleTimeout 60s
spark.dynamicAllocation.cachedExecutorIdleTimeout 30min
spark.dynamicAllocation.shuffleTracking.enabled false
spark.dynamicAllocation.shuffleTracking.timeout 30min
spark.dynamicAllocation.schedulerBacklogTimeout 1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 1s
spark.cleaner.periodicGC.interval 5min

spark.sql.adaptive.enabled true
spark.sql.adaptive.forceApply false
spark.sql.adaptive.logLevel info
spark.sql.adaptive.advisoryPartitionSizeInBytes 256m
spark.sql.adaptive.coalescePartitions.enabled true
spark.sql.adaptive.coalescePartitions.minPartitionSize 1MB
spark.sql.adaptive.coalescePartitions.initialPartitionNum 8192
spark.sql.adaptive.fetchShuffleBlocksInBatch true
spark.sql.adaptive.localShuffleReader.enabled true
spark.sql.adaptive.skewJoin.enabled true
spark.sql.adaptive.skewJoin.skewedPartitionFactor 5
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 256m
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin 0.2
spark.sql.autoBroadcastJoinThreshold -1

spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port 18080
spark.history.fs.logDirectory hdfs://beluga/spark-history3.5
Spark.history.fs.update.interval 10s
spark.history.retainedApplications 200
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 7d
spark.history.fs.cleaner.maxAge 30d
spark.eventLog.rolling.enabled true
spark.eventLog.rolling.maxFileSize 256m
spark.eventLog.enabled true
spark.eventLog.dir hdfs://beluga/spark-history3.5
spark.yarn.historyServer.address bigdata-102.whale.com:18080

spark.sql.queryExecutionListeners org.msw.listener.SparkSqlLineageListener

To Reproduce

Expected behavior
权限正常,目录775,文件644.

Urgency
Describe the impact and urgency of the bug.

Are you planning to fix it
Please indicate if you are already working on a PR.

Additional context
Add any other context about the problem here.

@haoranchuixue haoranchuixue added the type-bug This issue is about a bug label May 15, 2024
@YichuanSun
Copy link
Contributor

alluxio.worker.data.folder.permissions="rw-r-xr--"

try to add this to you alluxio-site.properties, then can you see all the mod be "654" instead of "777"?

@haoranchuixue
Copy link
Author

alluxio.worker.data.folder.permissions="rw-r-xr--"

try to add this to you alluxio-site.properties, then can you see all the mod be "654" instead of "777"?

Tanks!!
设置 alluxio.worker.data.folder.permissions="rw-r-xr--" 后spark任务只要执行依然还是777。

但发现一个现象(默认目录属主都是alluxio,权限为777且PIN为YES),设置 alluxio.worker.data.folder.permissions="rw-r-xr--" 并且使用chown和chmod分别修改旧目录的属主和权限。则pin为no。
目前新建(通过spark sql创建iceberg catalog里的表)出来的目录和文件还是777。
2

@YichuanSun
Copy link
Contributor

According to your test results, I don't think Alluxio causes the issue. Possibly due to Spark or Iceberg? I'm not sure, but I will ask other engineers and give you a feedback soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

2 participants