SYNCHRONIZE Operation not working
Context
After creating a table based on Hive, the user performs a SYNCHRONIZE command.
Exemple
Drop Table If EXISTS Myschema.Table1;
CREATE TABLE IF NOT EXISTS Myschema.Table1 FROM HiveMyschema.Table1 IN 'jdbc:hive2://indeximaServeur:port' () ;
After executing the SYNCHRONIZE command, there is no data left in the Indexima table, whereas there is data in the Hive Table.
Exemple
Synchronize Myschema.Table1 ;
SELECT * FROM Myschema.Table1 limit 10 ; Returns 0 line
Symptoms & checks
When checking the logs, we can see the system noticed some partitions.
INFO [RequestRunnners-16-thread-22] io.galactica.u - Find partition Part1=2019-01-24/Part2=2019-01-24 11%3A07%3A00 for Myschema.Table1_0
INFO [RequestRunnners-16-thread-22] io.galactica.u - Find partition Part1=2019-06-13/Part2=2019-06-13 17%3A59%3A37 for Myschema.Table1_0
Did you create partitions with values containing special characters such: ; . : ? (for example, when storing Timestamp in string format: 2019-06-13 17:59:37 )
In Hive, check with Show Partitions Myschema.Table1 that special characters are encoded.
Partitions |
---|
Part1=2019-10-29/Part2=2019-10-29 11%3A13%3A59 |
Part1=2020-01-31/Part2=2020-01-31 10%3A43%3A35 snapshot |
Resolution
In the hive cluster, execute the command (admin rights required).
set hive.decode.partition.name=true;
See the reference: https://issues.apache.org/jira/browse/HIVE-3679
Hive will send partitions with decoded names. Indexima needs those decoded names to load the data.
Best Practice
It is recommended not to create partitions values containing special characters.