galactica.conf
big.index.swap.pathThe file "galactica.conf", located in your /<indexima_install_folder>/galactica/conf folder, is used to configure the Indexima cluster in various environments with multiple security mechanisms while allowing to tune Indexima Data Engine for optimum performance. "galactica.conf" also helps with troubleshooting by enabling several levels of debugging modes.
Parameter modifications in the "galactica.conf" file are not dynamically applied. They require restarting the Indexima cluster to be taken into account.
Some parameters are specified as dynamic meaning that they can be altered by a query afterward. The HSQL command SET_
followed by one space can prefix the "galactica.conf" parameter with the new value.
# static parameter
result.max_size = 512
# dynamic parameter
SET_ result.max_size = 256
Modifications done using such a dynamic method do not require restarting the Indexima cluster. However, all changes are lost if the Indexima cluster is restarted. In this case, altered parameters revert to the values specified in "galactica.conf" or to the default values when not specified.
Dynamic parameter changes are preserved when issuing a cluster INIT command.
Parameters
Below are listed all possible parameters for "galactica.conf", with their default value and a quick description.
General
Nodes
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
nodes.requested | integer | 1 | Number of requested nodes in the cluster. See install-indexima-engine for details. | No |
nodes.connect.min-nodes | integer | 1 | Minimum number of nodes to start the cluster. See install-indexima-engine for details. | No |
Other
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
warehouse required | string | home/indexima/warehouse | Absolute path for storage of HyperIndex and Data Spaces. | No |
cores required | integer | 8 | Number of CPU cores used by Indexima Data Engine. Indexima only supports an equal number of cores used per node. If you have nodes with a different number of cores between each other, you must set this parameter to the lowest common denominator. | No |
loaders | integer | default to same value as cores parameter | Max number of threads used for loading queries (LOAD DATA typically) on a node. This is also the max number of load queries that can be run in parallel on a node. | No |
queries required | integer | 32 | Max number of total threads used for querying on a node. | No |
cluster.name | String | MyCluster | Added to the name of the .zip file generated during a Diagnosis | |
server.root-temporary-path | String | ${java.io.tmpdir}/indexima | Define a root directory used for all temporary files (analyzer.cache.disk.path, big.index.swap.path, disk.backed.request.result.output.path, spill.disk.path). Default value is defined by default system temporary folder or java.io.tmpdir property if defined. | No |
Storage related parameters
General
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
partitions required | integer | Number of shards (a type of horizontal partitions) across potentially multiple instances of the schema. It is highly recommended to set partitions according to the formula: partitions = cores * number of nodes. You can oversize it if you want to anticipate a cluster size change. You can not undersize it. If you change the number of partitions, all your tables must be dropped and recreated, and all data reloaded. | No | |
dimension.partitions | integer | 8 | Number of partitions used for dimension tables (tables without HyperIndex). example: For a 4 nodes cluster with 8 cores per node: dimension.partitions = 8 | No |
export.partition.size.mb | integer | 10000 | Defines the maximum size in Mb of files generated during an export. | No |
pages.oneFilePerColumn | boolean | true | When true, Indexima creates one file per column to take advantage of the Indexima K-Store technology. So when using local/shared file systems or cloud file-systems such as S3, this parameter must be set to true. When false, Indexima forces K-Store to store all columns in a single file to limit the number of open files. This is used when the Indexima warehouse is located on an HDFS file system. | No |
Object storage (AWS S3 / MINIO / CEPH)
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
warehouse.s3.compatible | boolean | false | If you are using your own fake S3 server, set this parameter to true. Otherwise, S3 connections are considered genuine AWS S3. | No |
warehouse.s3.endpoint | string | When warehouse.s3.compatible is set to true, you must provide the endpoint URL of your S3 server. example: warehouse.s3.endpoint = https://my_minio:9000 | No | |
warehouse.s3.bucket | string | Define the name of the bucket to be used by Indexima when using a fake S3 connection. | No | |
warehouse.s3.cert.check | boolean | false | When set to true, the S3 connection will fail if the used SSL certificate is insecure. If you are using a self-signed certificate, leave this parameter to false. | No |
s3.max.retry | integer | 40 | Number of retries on s3 failure. | No |
warehouse.s3.key | string | Amazon S3 directory where files are stored for S3 compatible FS | No | |
s3.multi-part-part-size | long | 134217728 | Size (in bytes) of the parts when executing a multi-part copy (AWS allows parts between 5 MiB and 5 GiB). | Yes |
s3.multi-part-upload-threshold | long | 1073741824 | Size (in bytes) before multi-part copy is used (AWS does not support copy for files bigger than 5 GB). | Yes |
s3.region | string | eu-west-1 | Configure the region with which the SDK should communicate. | No |
Connectivity toward Indexima
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
node.port | int | 19 999 | Port used by the Indexima Galactica engine to communicate with the other nodes of the Indexima Cluster. Note: Indexima also use the 'port +1' for internal communication. Eg if node.port is 19999, the port 20000 is also used. | No |
webui.port | 9999 | Port used by the Indexima Monitor Console. This webserver aggregates logs, queries history, and hosts Indexima Analyzer, among other features. | No | |
webui.ssl.enable | boolean | false | When true, The Indexima Monitor Console uses a secured SSL HTTPS protocol to connect.
When false, the standard HTTP protocol is used. | No |
webui.ssl.keystore.location | string | Specify the location of the SSL certificates Keystore to secure the Indexima Monitor console access with HTTPS | No | |
webui.ssl.keystore.password | string | Define the password associated with the Indexima Keystore. | No | |
monitor.api.key | string | EvdbpGMCWPzpSzgkjTqq9SjM | Service API key between Indexima cluster and Developer Console. | |
heartbeat | integer | 300 000 | Time in milliseconds for a full round-trip heartbeat packet between the Indexima master and each worker node. The default value is 5 minutes. | No |
Indexima Connectivity toward external datasources
Parameter | Type | Choices/Default | Description | Dynamic command | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
jdbc.timeout.seconds | integer | 60 | Timeout value of the JDBC driver connection in seconds. This is the maximum time allowed for the data source to give a response, not the maximum time the query can take. | No | ||||||||||||||
jdbc.load.fetch.size | integer | 1000 | Specify the number of rows fetched with each database round trip for a query when importing data from an external JDBC source. | Yes | ||||||||||||||
jdbc.query.timeout.seconds | integer | 60 | Timeout for statement JDBC execution in seconds | Yes | ||||||||||||||
jdbc.create.field.restriction | boolean | true | When set to true, Indexima replaces some characters on the fly to avoid wrong interpretation of incoming data. Character substitution chart
| Yes |
Logs and history
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
log.dir | string | /var/log/indexima/log | Path to the folder where the indexima logs are written | No |
log.insert | integer | 1 000 000 | During a "load" command, log message every N inserts | No |
log.level | string | INFO | Log level of the cluster | No |
history.count | integer | 500 | Maximum numbers of queries in the live history (in the User interface) | No |
history.flush | integer | 10 | Max number of queries before flushing history to disk | No |
history.dir | string | /var/log/indexima/history | Query history directory. Please see detailed configuration of shared storage. | No |
history.export | string | /var/log/indexima/history-export | CSV format Query history export directory Please see detailed configuration of shared storage. | No |
hive.log.threshold | string | ERROR | Hive log threshold | No |
hive.log.dir | string | /var/log/indexima/hivelog | Hive log directory | No |
Memory Usage Parameters
Tables & Indexes
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
index.memory.max_size.mb | integer | 1024 | index.memory.max_size defines the maximum size (in Mb) of each index, per node. This parameter limits the size of an HyperIndex to prevent running out of memory if the cardinality of the selected index is too high. | Yes |
table.memory.max_size.mb | integer | 4096 | Set the maximum size of a dataspace (table) per node. The size is in Mb. This parameter prevents running out of memory when using multiple large tables. | Yes |
limited.memory.max_size.mb | integer | 1024 | Maximum size of a limited table(in Mb). | Yes |
table.prefetch.lastdays | 2 | Preload indexes that have been used during the last N days | Yes |
Big indexes
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
big.index.enable | boolean | FALSE | Enable big index feature | Yes |
big.index.limit.mb | long | 5000 | Memory limit for big index behaviour | Yes |
big.index.max.active.partitions | integer | 2 | Max active partition allowed for queries on a big index | No |
big.index.swap.path | string | ${server.root-temporary-path}/swap | Default location for index data swap | Yes |
Queries
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
query.max | long | 100 000 | Maximum number of bytes of a SQL Statement | Yes |
result.max_size.mb | long | 256 | Define the maximum size in MB of the queries computed results | Yes |
global.result.max_size.mb | long | 512 | Define the global maximum size in MB to store the calculated results of ALL the queries at one moment. The guideline to set this parameter is the number of nodes of the cluster times result.max_size. The minimum value recommended is 512MB. | Yes |
select.timeout.ms | long | -1 | Timeout in milliseconds of query execution. If the query lasts more than the timeout time, the system will stop the query with the error Timeout | Yes |
join.memory.max_size.mb | integer | 12 | Maximum size allocated for one line join computation(in Mb). | Yes |
cognos.limit.select | Integer | -1 | Add a limit on SELECT queries from Cognos (queries with no sub-requests, no join and no group by) to prevent Cognos from selecting more than cognos.limit.select lines. Value -1 means disabled. | Yes |
queries | integer | 32 | Number of threads used for querying | No |
queries.high-cost | integer | 5 | Number of high cost queries that may run at the same time | No |
queries.high-cost.frozen | integer | 5 | Maximum number of frozen high cost queries | Yes |
queries.hybrid.ratio | double | 0.25 | Ratio of hybrid queries allowed to run concurrently | No |
query.high-cost.memory.mb | long | 128 | Memory size threshold for a query to become high cost | Yes |
cognos.show-table-with-table-name | boolean | false | False allows to respond to the show table sent by Cognos which does not contain the name of the schema | Yes |
Cache
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
cache.master.mb | integer | 0 | Define a certain amount of RAM in MB allocated to cache Hive 2 queries addressed by Indexima Data Hub for better performance | Yes |
cache.master.min_exec.ms | integer | 1000 | Define the threshold upon a query result would be put into the cache | Yes |
Load
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
insert.queue.mem.size.mb | integer | 256 | During a load, Maximum insert queue memory size (in Mb) | Yes |
Miscellaneous
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
memory.coef | float | 0.85 | Coefficient of max heap memory used for the Hyperindexes | No |
Security
General
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
impersonation | boolean | false | When | No |
webui.authenticate | boolean | false | when when | No |
session.users | string | Allow users listed in session.user to connect to Indexima Data Engine without Kerberos authentication.example: session.users = user1, user2, user3 | No | |
session.passwords | string | List of passwords for users in session.user. Passwords are attributed to the user in the corresponding list index In the example above, user1 has password "pass1", user2 has password "pass2", and user3 has password "pass3". | No | |
users.in.admin.role | string | List of users that have full rights to administer and run Indexima cluster. Indexima lists 2 types of users: administrators (they are listed with this parameter) and the users (all the others) example: users.in.admin.role = user1, user2 | No | |
webui.rights | boolean | false | Monitor Roles enabler | No |
Ranger
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
privilege.driver.name | string | Java class name of the Indexima plugin driver example: privilege.driver.name = io.galactica.ranger.client.RangerIndeximaDriver | No | |
privilege.driver.property.servicetype | string | Set the Ranger property servicetype to be used with the Indexima Ranger plug-in. This property will be used to display Service name of the plugin in Ranger GUI.example: privilege.driver.property.servicetype = indexima | No | |
privilege.driver.property.appid | string | Set the Ranger property appid to be used with the Indexima Ranger plug-in. This property identifies the application IDexample: privilege.driver.property.appid = indexima | No |
SSL
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
node.ssl.enable | boolean | false | When true, encrypt data on the private Indexima network, enabling Inter-Node SSL keystore | No |
node.ssl.keystore.location | string | Indicates the path of the Inter-Node SSL keystore location example: node.ssl.keystore.location = /path/to/my_keystore | No | |
node.ssl.keystore.password | string | Specify the password associated with the Inter-Node SSL keystore defined at node.ssl.keystore.location | No |
Misc
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
allow.create.if-select | boolean | false | Table creation is allowed only if select on this table is granted. | No |
Analyzer
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
analyzer.hits | integer | 3 | Let you define the default required hits when you enter the Analyzer page. example: analyzer.hits = 5 | UI |
analyzer.days | integer | 30 | Let you define the default number of days when you enter the Analyzer page. example: analyzer.days = 10 | UI |
integer | 4 | Let you define the default cardinality when you enter the Analyzer page. example: analyzer.cardinality = 6 | UI | |
sampling.external.max | integer | 100 000 000 | The maximum number of lines to use for the sampling in the Analyzer if the table is external. | UI |
sampling.internal.max | integer | 10 000 | The maximum number of lines to use for the sampling in the Analyzer if the table is not external. | UI |
analyzer.evaluation.cardinality.small | integer | 10 (percent) | Cardinality for small level index. If the cardinality computes for an index is below this value, the index is considered small. | No |
analyzer.evaluation.cardinality.medium | integer | 20 (percent) | Cardinality for medium-level index. If the cardinality computes for an index is below this value, the index is considered medium. | No |
analyzer.evaluation.cardinality.big | integer | 40 (percent) | Cardinality for big level index. If the cardinality computes for an index is below this value, the index is considered as big. Otherwise, the index is considered dangerous and should be avoided. | No |
analyzer.cache.size.bytes | long | 1 000 000 | Cache size for evaluating cardinalities of expressions during the analysis | No |
analyzer.mode | enum | PARSING/INCREMENTAL | Incremental mode to parse only new queries.
CODE
| Yes |
analyser.history.duration.days | integer | 30 | Yes | |
analyser.default-merge-policy | string | MAX_INDEX/COEF | Default policy used to merge indexes (COEF, MAX_INDEX).
| Yes |
analyser.default-max-expected-indexes | integer | 8 | Maximum targeted index count when use MAX_INDEX merge index policy. | Yes |
analyser.history.size.max | interger | 1000 | Allows customization of the maximum number of incremental plans stored per day. | Yes |
YARN
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
yarn.resourcemanager.hostname | string | value in yarn-site.xml | Hostname of the Yarn Resource Manager to call in order to create the Indexima Yarn Application example: yarn.resourcemanager.hostname = localhost | No |
yarn.memory.mb | integer | 1024 | Allocates memory (in mb) to run the Indexima data engine. This value must be greater than java heap (GALACTICA_MEM) + 20% example: yarn.memory = 40000 (40 GB) | No |
yarn.memory.master.mb | integer | 256 | Memory size (in mb) of the master container. This value must be the lower possible but greater than the value set by Hadoop or an error is fired by YARN. example: yarn.memory.master = 128 | No |
yarn.dir | string | hdfs://localhost:8020/user/` + System.getenv("USER") + `/indexima | Define the HDFS directory used to share the application binaries and configuration files when deploying an Indexima cluster in Hadoop. example: yarn.dir = hdfs://localhost:8020/tmp/indexima | No |
string | Indexima | Name of the Indexima cluster example: yarn.name = indexima_prod | No | |
yarn.kerberos | boolean | false | Must be true if the Indexima data engine is running in an Hadoop cluster secured by Kerberos example: yarn.kerberos = true | No |
yarn.nodes-constraint | string[] | null | YARN constraint to request that the containers are placed on one of those hosts example: yarn.nodes-constraint=node1.internal.com,node2.internal.com,node3.internal.com | No |
yarn.relax-locality | boolean | No | ||
string[] | null | No |
Spill To Disk
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
spill.enable | boolean | true | Enable spill to disk function | Yes |
spill.memory.size.mb | long | 256 | Memory size taken to process queries before being spilled on disk. When the size of query elements is greater than this value, queries will start spilling on disk. | Yes |
spill.disk.path | string | ${server.root-temporary-path}/indexima-cache | Folder on the disk of each node where elements will be spilled. | Yes |
request.result.max.chunk.size | Integer | 10000 | Maximum number of lines in each chunk when a large output result is spilled to disk. | Yes |
External Tables
General
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
external.field.max_size | integer | 127 | when the field name or alias name size is bigger than external.field.max_size then field name or alias name is renamed to be shortened | Yes |
Synchronization
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
external.synchronize.consistency.enable | boolean | false | execute a synchronize or not before adding a new index | Yes |
Automatic Synchronization
Only available for Snowflake datasource
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
external.synchronize.check.cron | string | empty | Cron expression for table synchronization check, more precise than | Yes |
external.synchronize.check.rate | integer | 0 | Number of seconds between external table synchronization check | Yes |
external.synchronize.check.user | string | admin | The Indexima user who will run the SYNCHRONIZE during the automatic update | No |
Smart Tables
Smart Tables Process parameters
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
analyser.smart.metrics.days | integer | 15 | The number of sliding days to pickup queries for analysis. | Yes |
analyser.smart.optimizer | string | Last month tuning | Default optimizer to use. Can be overridden for each table. Values are defined in optimize_index.json. | Yes |
analyser.smart.scheduling.cron | string | <empty> | Define when the process of analysis and index creation starts | Yes |
analyser.smart.scheduling.duration.minutes | integer | 120 | Maximum duration of smart tables analysis | Yes |
analyser.smart.threads | integer | 2 | Number of threads used for running smart tables indexation. | No |
analyser.smart.max.indexes | integer | 20 | Maximum number of indexes for a smart table. | Yes |
Smart Tables weight parameters
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
analyser.smart.threshold.slow.ms | integer | 2000 | Speed threshold where a request is considered slow | No |
analyser.smart.weight | integer | 1 | Default weight for score computation | Yes |
analyser.smart.weight.bucket | integer | 1 | Weight of the K-Store query for score computation | Yes |
analyser.smart.weight.speed | integer | 1 | Weight of the speed ratio for score computation | Yes |
analyser.smart.weight.delegate | integer | 1 | Weight of query delegated to the underlying table for score computation | Yes |
analyser.smart.weight.size | integer | 1 | Weight of the table size ratio for score computation | Yes |
analyser.smart.weight.traffic | integer | 1 | Weight of the traffic ratio for score computation | Yes |
Miscellaneous
Parameter | Type | Choices/Default | Description | Dynamic command |
---|---|---|---|---|
notification.check.cron | string | empty | SET_ notification.check.cron = "0 0/15 * * * ?" // every 15 minutes | Yes |
nodes.connect.timeout.seconds | integer | 120 | Number of seconds to wait before starting the Indexima cluster with a missing node member. This option allows starting a stand-alone Indexima cluster without taking the precaution to start all workers before the master. | No |
powerbi.impersonate.field | string | empty | Define a field name. If that field name is used in a where clause of a query as field='value', it will result in the fact the string contained in the operand will be considered as the actual user executing the query. The field name can be any field name provided it doesn't already exist as an actual field of the table. To be used in a query, this field name needs to exists. Thus it is compulsory to add the virtual field in the table. Example:
SQL
when executing this query
SQL
it would be as if a user named 'test' executed the query and | No |
timestamp.precision | string | HOUR/MINUTE/SECOND/DAY | Default precision of timestamp fields at table creation | Yes |
history.max | integer | 20 000 | Maximum number of queries loaded for previous day display in webUI | No |
error.max | integer | 10 000 | Maximum number of errors during a load | Yes |