Connect a standalone Indexima cluster to a kerberized instance
Kerberos configuration on Indexima cluster
For example, on CentOS
yum install krb5-workstation
Edit /etc/krb5.conf file
- Modify default domain
- admin_server and kdc location
- Possible problems
- Comment renew_lifetime parameter
- Modify default_ccache_name to use /tmp directory and not keyring
HDFS configuration
You will have to choose a user and a keytab to connect to your kerberized cluster. This user needs to be declared as a proxy user in your HDFS configuration.
After modification, you will need to restart your HDFS cluster.
Example of impala as a proxy user :
<property>
<name>hadoop.impala.indexima.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.impala.knox.groups</name>
<value>*</value>
</property
Galactica configuration modification
jaas.conf
You can create a keytab for indexima on your kerberized cluster or use the keytab.
You need to copy your keytab on each machine from the Indexima cluster.
Create a jaas.conf file.
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
principal="impala/ipadress@DOMAIN.COM"
keyTab="/etc/security/keytabs/impala.keytab"
useKeyTab=true
storeKey=true
debug=true;
};
Galactica-env.sh
Add the following line
export NODESERVER_JVM_OPTIONS="-Djava.security.auth.login.config=/opt/k/work/indexima/galactica/jaas.conf -Djavax.security.auth.useSubjectCredsOnly=false"
Additional actions to connect to a Kerberized Impala
Execute a manual Kinit
Depending on your Impala driver version (2.5.5.1007), you will need to do a manual kinit with the user your choose to connect on your Impala cluster.
kinit -kt ... (specify your user and keytab)
Table creation from Impala
create table from_impala from my_impala_table
IN 'jdbc:impala://impala_server_adress:impalaPort;AuthMech=1;KrbRealm=XXX.COM;KrbHostFQDN=ip-FQDN-adress-;KrbServiceName=impala'
(index(id1))
Table load from Impala
You can load data from Impala by doing a JDBC load but it is more efficient to use an HDFS load
load data inpath 'hdfs://ipadress:8020/user/hive/warehouse/xxx' into table from_impala format parquet
Help for debug purposes
impala-shell installation
This section is not mandatory but may be useful for debugging purposes.
yum install python-pip gcc gcc-c++ cyrus-sasl-devel
python pip install impala-shell
Try to connect to remote Impala instance
kinit -kt ... (your keytab) ...
impala-shell -k
Check if you can browse tables and data from Impala.