Files Load (CSV, parquet...) via command
Indexima supports loading from the following format:
- CSV
- JSON
- PARQUET
- ORC
Load data directly from HDFS to avoid Hive
You can use the LOAD DATA command to extract data directly from a Hadoop Datanode.
Command
LOAD DATA HDFS
LOAD DATA INPATH 'hdfs://data_node:8020/apps/hive/warehouse/my_hive_data INTO TABLE my_table';
COMMIT my_table;
This will read every file located in /apps/hive/warehouse/my_hive_data of the data node. The data will be loaded in an Indexima table named "my_table" and then be committed.
The LOAD DATA INPATH designates a folder and not a file. All the data files inside this directory will be imported. All such files must have the same structure and format to get a consistent result in the final table.
Load partitions from Hive table
You can learn more about the LOAD DATA INPATH commands here.
Load data from files directly on the filesystem of the machines running Indexima.
Command
LOAD DATA LOCAL
LOAD DATA LOCAL INPATH '/tmp/my_data' INTO TABLE my_table FORMAT CSV SEPARATOR ',' SKIP 2;
COMMIT my_table;
This will load all data in the files located in the folder /tmp/my_data into the Indexima table default.my_table. The files must be CSV files with a comma separator. The first 2 lines are skipped.
More
You can learn more about the LOAD DATA INPATH commands here.
More
You can learn more about the LOAD DATA commands here.