Data Catalog Integration
Indexima is integrated with Apache Atlas data catalog and any data catalog using the same interface as Atlas. This integration allows organizations' metadata management and governance capabilities to build a catalog of their data assets connected to Indexima.
Installation & Deployment
Download and deploy the JAR file
To use the Apache Atlas integration, the Atlas library needs to be deployed to each Indexima node.
Download the appropriate version of this library indexima-atlas-lib-[VERSION].jar from https://download.indexima.com/release and deploy it to each indexima node in /galactica/ext
directory.
Configure file atlas-application.properties
Please refer to atlas-application.properties to configure this file.
Restart Indexima service after adding the Atlas library and atlas-application.properties file for Indexima service to load Atlas.
Authentication Configuration
Indexima can authenticate to Atlas with FILE, LDAP, or KERBEROS authentication mechanism.
Rights
Whatever the mechanism that would be used to connect to Atlas ( dedicated used or Indexima user that runs the application), make sure this user will have the WRITE rights within Atlas.
FILE and LDAP authentication
For FILE and LDAP authentication, the user used to connect to Atlas can be provided with parameters atlas.user
and atlas.password.
The parameter atlas.enable
allows to activate/deactivate the atlas integration (see galactica.conf).
Example of atlas activation with a FILE or LDAP connexion: execute the following commands in Indexima console
SET_ atlas.user=[ATLAS_ADMIN_USER];
SET_ atlas.password=[ATLAS_PASSWORD];
SET_ atlas.enable=true;
As with any dynamic parameters, dynamically set atlas parameters are stored in the warehouse in galactica_ext.conf file. The Atlas password is automatically encrypted when set.
Please note that any change of Atlas.user or Atlas.password must be followed by an atlas.enable=true in order for the change to take effect.
Note: If the atlas parameters are added directly in galactica.conf (not recommended for dynamic parameters), the atlas password must not be encrypted.
Kerberos authentication
For Kerberos authentication, after adding atlas.authentication.method.kerberos=true in file atlas-application.properties, the atlas integration is enabled with the following command in Indexima console:
SET_ atlas.enable=true;
Data Catalog Feed
Initialization
After enabling Atlas integration as described in previous section, start the Indexima cluster normally (without any atlas specific command). Once the cluster is up and running, you can test the Atlas integration by creating a new schema and controlling this schema is correctly synchronised with Atlas.
When the Atlas synchronisation is operational, run ./start-node.sh --import-atlas
on an indexima node to send all the objects already created in the past to Atlas. Please note that this command will not start any node, it will only send a command to the running cluster asking to trigger a full atlas synchronisation.
If the integration between Indexima and Atlas has been interrupted, and some objects existing in Atlas have been deleted from Indexima in the meanwhile, you can instead run ./start-node.sh --import-atlas-clean
in order to force the deletion in Atlas on any objects already deleted from this Indexima cluster (deletion based on Indexima cluster name matching). This option is equivalent to the deleteNonExisting flag described here.
Operations captured
Once Apache Atlas integration is enabled, any creation of an object in Indexima will trigger a call to propagate this object to Atlas. The metadata of the object, author, creation timestamp, and lineage of objects are available in Atlas. Indexima objects are modelized as standard hive objects in Atlas, as described in https://atlas.apache.org/#/HookHive.
The following hive operations are currently captured:
create database/table/view
alter database/table/view