High-Availability
Prerequisites
- Warehouse file systems amongst:
- Distributed file systems such as NFS server
- S3/ADLS/GS
- HDFS
Overview
Indexima clusters can be made highly available by configuring multiple master nodes.
- When this is done, a load balancer (typically Zookeeper in Hadoop context) needs to be set up to be able to send queries to one of the master nodes. The load balancer is responsible for determining the health state of the masters and choosing which node to send the next query to.
- Indexima HA works for master nodes and master coordination jobs.
- If a worker is lost, a suspension of service will occur while the hyperindexes are redistributed and reloaded between the remaining working nodes. The duration of this suspension of service is proportional to the amount and the size of the HyperIndexes existing in the cluster.
- Metadata is shared between all master nodes, ensuring that queries history, analysis for hyperindexes suggestions, and diagnosis extraction are working as long as at least one master node is up.
How Service Recovery works
Being a product designed for performance, HyperIndexes are loaded and equally distributed to each worker, all the while avoiding data duplication.
What Indexima does in case of a worker failure?
When a node is lost, the SQL query sent after a node loss may fail, stating that one of the nodes is unreachable, and therefore, the whole data cannot be retrieved.
The cluster will unload hyperindexes and equally redistribute them between the remaining nodes.
During the redistribution of hyperindexes data to the remaining nodes, a suspension of service is expected.
Indexima keeps running as long as at least one master node is up, regardless of the number of nodes lost.
If too many nodes are lost, you will get a "memory low" warning, some hyperindexes being unloaded.
Recovery of lost nodes
Lost nodes are not automatically reintegrated.
Once nodes are available, the following USER action is required to reintegrate newly available nodes into the cluster:
- In Monitor, Go to Administration/Nodes Status section
- Press the "Re-initialize Cluster" button
Like any reinitialize phase, the cluster will unload hyperindexes and equally redistribute them between nodes. Thus a loss of service is expected during that time.
Enabling High Availability
Go to Dynamic Mode for configuration of high availability