Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS Exchange Manager Kerberos Authentication Issue for Fault-tolerant execution - SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] #21993

Open
nesat opened this issue May 16, 2024 · 0 comments

Comments

@nesat
Copy link

nesat commented May 16, 2024

We wanted to enable fault-tolerant execution and use HDFS exchange manager as one of the recommended spooling storage types. Our HDFS cluster is kerberized, and Hive and Delta connectors uses HDFS already. However, with HDFS Exchange Manager, we receive java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] error. Did anybody face a similar issue? Does anybody use HDFS Exchange Manager with Kerberos authentication?

Configuration Details

Here is how we set the HDFS Exchange Manager and enable task retry:
/etc/trino/config.properties

...
retry-policy=TASK

/etc/trino/exchange-manager.properties

exchange-manager.name=hdfs
exchange.base-directories=hdfs://dev-a/tmp/trino
hdfs.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Here are our resource files:
/etc/hadoop/conf/core-site.xml

<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://dev-a</value>
    </property>
    <property>
        <name>hadoop.security.authentication</name>
        <value>kerberos</value>
    </property>
</configuration>

/etc/hadoop/conf/hdfs-site.xml

<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
    <property>
        <name>dfs.internal.nameservices</name>
        <value>dev-a</value>
    </property>
    <property>
        <name>dfs.nameservices</name>
        <value>dev-a</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.dev-a</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.dev-a.nn1</name>
        <value>namenode1.myserver.com:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.dev-a.nn2</name>
        <value>namenode2.myserver.com:8020</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.dev-a</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.datanode.kerberos.principal</name>
        <value>hdfs/_HOST@DEV-A.MYSERVER.COM</value>
    </property>
    <property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>nn/_HOST@DEV-A.MYSERVER.COM</value>
    </property>
</configuration>

Additional details:

  • Trino versions tested: 408, 442, 445
  • krb5.conf location: /etc/krb5.conf
  • kerberos ticket cache: krb5cc_1000
  • klist output:
Default principal: trino@DEV-A.MYSERVER.COM

Valid starting     Expires            Service principal
05/16/24 07:45:04  05/17/24 07:45:04  krbtgt/DEV-A.MYSERVER.COM@DEV-A.MYSERVER.COM
	renew until 05/23/24 07:45:04

Error logs

java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
	at io.trino.plugin.exchange.filesystem.FileSystemExchange.instantiateSink(FileSystemExchange.java:166)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$StageExecution.getExchangeSinkInstanceHandle(EventDrivenFaultTolerantQueryScheduler.java:2129)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.processNodeAcquisitions(EventDrivenFaultTolerantQueryScheduler.java:1606)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.schedule(EventDrivenFaultTolerantQueryScheduler.java:1012)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.run(EventDrivenFaultTolerantQueryScheduler.java:832)
	at io.trino.$gen.Trino_442____20240516_113953_2.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2509)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2483)
	at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1485)
	at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1482)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1499)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1474)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2464)
	at io.trino.plugin.exchange.hdfs.HadoopFileSystemExchangeStorage.createDirectories(HadoopFileSystemExchangeStorage.java:76)
	at io.trino.plugin.exchange.filesystem.FileSystemExchange.instantiateSink(FileSystemExchange.java:163)
	... 12 more
Caused by: org.apache.hadoop.ipc.RemoteException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584)
	at org.apache.hadoop.ipc.Client.call(Client.java:1530)
	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
	at jdk.proxy6/jdk.proxy6.$Proxy280.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:675)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
	at jdk.proxy6/jdk.proxy6.$Proxy281.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2507)
	... 21 more

Trino startup logs

...
2024-05-16T08:20:14.492Z	INFO	main	Bootstrap	retry-policy                                                                            NONE                                                                       TASK
...
2024-05-16T08:20:36.806Z	INFO	main	io.trino.security.GroupProviderManager	-- Loaded group provider file --
2024-05-16T08:20:36.807Z	INFO	main	io.trino.exchange.ExchangeManagerRegistry	-- Loading exchange manager hdfs --
2024-05-16T08:20:37.094Z	INFO	main	org.hibernate.validator.internal.util.Version	HV000001: Hibernate Validator 8.0.1.Final
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	PROPERTY                                 DEFAULT  RUNTIME                                               DESCRIPTION
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	jmx.base-name                            ----     ----
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.base-directories                []       [hdfs://dev-a/tmp/trino/]                             List of base directories separated by commas
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.file-listing-parallelism        50       50                                                    Max parallelism of file listing calls when enumerating spooling files. The actual parallelism will depend on implementation
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.sink-buffer-pool-min-size       10       10
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.sink-buffers-per-partition      2        2
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.sink-max-file-size              1GB      1GB                                                   Max size of files written by exchange sinks
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.source-concurrent-readers       4        4
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.source-handle-target-data-size  256MB    256MB                                                 Target size of the data referenced by a single source handle
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.source-max-files-per-reader     25       25
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.max-output-partition-count      50       50
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.max-page-storage-size           16MB     16MB                                                  Max storage size of a page written to a sink, including the page itself and its size represented as an int
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	exchange.hdfs.block-size                 4MB      4MB                                                   Block size for HDFS storage
2024-05-16T08:20:37.918Z	INFO	main	Bootstrap	hdfs.config.resources                    []       [/etc/hadoop/conf/core-site.xml, /etc/hadoop/conf/hdfs-site.xml]
2024-05-16T08:20:39.210Z	INFO	main	io.trino.exchange.ExchangeManagerRegistry	-- Loaded exchange manager hdfs --
2024-05-16T08:20:39.304Z	INFO	main	io.trino.server.Server	Server startup completed in 31.60s
2024-05-16T08:20:39.304Z	INFO	main	io.trino.server.Server	======== SERVER STARTED ========

Additional Environment Variables

Setting additional env vars didn't help:

export KRB5_CONFIG=/etc/krb5.conf\
&& export KRB5_KTNAME=/etc/security/keytabs/trino.keytab\
&& export KRB5_CLIENT_KTNAME=/etc/security/keytabs/trino.keytab\
&& export KRB5_CLIENT_PRINCIPAL=trino@DEV-A.MYSERVER.COM\
&& export KRB5CCNAME=/tmp/krb5cc_1000\
&& export KERBEROS_PRINCIPAL=trino\
&& export KERBEROS_KEYTAB_DIR=/etc/security/keytabs\
&& export KERBEROS_KEYTAB=/etc/security/keytabs/trino.keytab\
&& export KERBEROS_KEYTAB_FILE=trino.keytab

Making sure core-site.xml is used

We made some changes to verify core-site.xml was accessible and used

Method 1) Editing exchange.base-directories=hdfs://dev-a/tmp/trino in exchange-manager.properties as exchange.base-directories=hdfs://namenode1.myserver.com:8020/tmp/trino:

java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode1.myserver.com:8020/tmp/trino/a4080f.20240515_133919_00000_4etia.external-exchange-0.0/0/, expected: hdfs://dev-a

Method 2) Removing hdfs-site.xml and replacing dev-a with namenode1.myserver.com:50070 in both exchange.base-directories and fs.defaultFS:

java.io.UncheckedIOException: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length
	at io.trino.plugin.exchange.filesystem.FileSystemExchange.instantiateSink(FileSystemExchange.java:166)

Verifying HDFS kerberos authentication with other methods

Method 1) Curl webhdfs HTTP endpoint which uses kerberos ticket:

curl --negotiate -u "http://namenode1.myserver.com:50070/webhdfs/v1/tmp/?op=LISTSTATUS"

klist after it:

Default principal: trino@DEV-A.MYSERVER.COM

Valid starting     Expires            Service principal
05/16/24 07:45:04  05/17/24 07:45:04  krbtgt/DEV-A.MYSERVER.COM@DEV-A.MYSERVER.COM
	renew until 05/23/24 07:45:04
05/16/24 07:55:34  05/17/24 07:45:04  HTTP/namenode1.myserver.com@DEV-A.MYSERVER.COM
	renew until 05/23/24 07:45:04

Method 2) hdfs in another pod with similar configurations but a different image:

$ hdfs dfs -mkdir /tmp/trino/newdir
$ hdfs dfs -ls /tmp/trino
Found 1 item
drwxr-xr-x   - trino hdfs          0 2024-05-16 11:00 /tmp/trino/newdir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant