In this blog post, we have explained about the row-level transactions available in Hive. This post will provide you a good idea of how to implement the row-level transactions on the Hive table.
Before beginning with the transactions in Hive, let’s look at the ACID properties, which are vital for any transaction.
What is ACID?
ACID stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity means, a transaction should complete successfully or else it should fail completely i.e. it should not be left partially. Consistency ensures that any transaction will bring the database from one valid state to another state. Isolation states that every transaction should be independent of each other i.e. one transaction should not affect another. And Durability states that if a transaction is completed, it should be preserved in the database even if the machine state is lost or a system failure might occur.
These ACID properties are essential for a transaction and every transaction should ensure that these properties are met.
Transactions in Hive
Transactions in Hive are introduced in Hive 0.13, but they only partially fulfill the ACID properties like atomicity, consistency, durability, at the partition level. Here, Isolation can be provided by turning on one of the locking mechanisms available with zookeeper or in memory.
But in Hive 0.14, new API’s have been added to completely fulfill the ACID properties while performing any transaction.
Transactions are provided at the row-level in Hive 0.14. The different row level transactions available in Hive 0.14 are as follows:
- Insert
- Delete
- Update
There are numerous limitations with the present transactions available in Hive 0.14. ORC is the file format supported by Hive transaction. It is now essential to have ORC file format for performing transactions in Hive. The table needs to be bucketed in order to support transactions.
Row-level Transactions Available in Hive 0.14
Let’s perform some row-level transactions available in Hive 0.14. Before creating a Hive table that supports transactions, the transaction features present in Hive needs to be turned on, as by default they are turned off.
The below properties needs to be set appropriately in hive shell , order-wise to work with transactions in Hive:
1 2 3 4 5 6 7 8 9 10 11 |
hive>set hive.support.concurrency = true; hive>set hive.enforce.bucketing = true; hive>set hive.exec.dynamic.partition.mode = nonstrict; hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; hive>set hive.compactor.initiator.on = true; hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service; |
If the above properties are not set properly, the ‘Insert’ operation will work but ‘Update’ and ‘Delete’ will not work and you will receive the following error:
1 |
FAILED: SemanticException [Error 10294]: Attempt to do update or delete usingtransaction manager thatdoes not support these operations. |
Creating a Table That Supports Hive Transactions
1 |
CREATE TABLE college(clg_id int,clg_name string,clg_loc string) clustered by (clg_id) into 5 buckets stored as orc TBLPROPERTIES('transactional'='true'); |
The above syntax will create a table with name ‘college’ and the columns present in the table are ‘clg_id, clg_name, clg_loc’. We are bucketing the table by ‘clg_id’ and the table format is ‘orc’, also we are enabling the transactions in the table by specifying it inside the TBLPROPERTIES as ‘transactional’=’true’
We have successfully created a table with name ‘college’ which supports row-level transactions of Hive.
The create table can be checked using the command show tables.
Inserting Data into a Hive Table
1 |
INSERT INTO table college values(1,'nec','nlr'),(2,'vit','vlr'),(3,'srm','chen'),(4,'lpu','del'),(5,'stanford','uk'),(6,'JNTUA','atp'),(7,'cambridge','us'); |
The above command is used to insert row wise data into the Hive table. Here, each row is seperated by ‘( )’ brackets.
Now, we have successfully inserted the data into the Hive table.
The contents of the table can be viewed using the command select * from college
From the above image, we can see that the data has been inserted successfully into the table.
Now if we try to re-insert the same data again, it will be appended to the previous data as shown below:
Updating the Data in Hive Table
1 |
UPDATE college set clg_id = 8 where clg_id = 7; |
The above command is used to update a row in Hive table.
From the above image, we can see that we have received an error message. This means that the Update command is not supported on the columns that are bucketed.
In this table, we have bucketed the ‘clg_id’ column and performing the Update operation on the same column, so we have go the error
FAILED: SemanticException[Error 10302]: Updating values of bucketing columns is not supported. Column clg_id
Now let’s perform the update operation on Non bucketed column
1 |
UPDATE college set clg_name = 'IIT' where clg_id = 6; |
We have successfully updated the data.
The updated data can be checked using the command select * from college.
We can see that the data has been updated successfully.
Now let’s perform the Delete operation on the same table.
Deleting a Row from Hive Table
1 |
delete from college where clg_id=5; |
The above command will delete a single row in the Hive table.
We have now successfully deleted a row from the Hive table. This can be checked using the command select * from college.
We can see that there is no row with clg_id =1. This means that we have successfully deleted the row from the Hive table.
This is how the transactions or row-wise operations are performed in Hive.
Follow our blog to stay updated of our latest blog posts on Big Data technologies.
Got a question for us? Please mention them in the comments section and we will address them as soon as possible.
Tanks a lot for your information its very useful and can you provide any information regarding for apache oozie with real time scenario?
When I am inserting a row into my table my API hangs at kill command, can you please help me out with this ?
HEllo Kiran
after setting the below properties
hive>set hive.support.concurrency = true;
hive>set hive.enforce.bucketing = true;
hive>set hive.exec.dynamic.partition.mode = nonstrict;
hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive>set hive.compactor.initiator.on = true;
hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
Its getting hanged and no operation can be done after that ……can you please help on it
I am using acadgild VM on 64 bit
Hi Amol,
set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
For this you need to give any positive value for example
set hive.compactor.worker.threads = 5;
I did that i have given 2
after setting all those properties whatever next hive statement i give it hangs for ever.
i am trying win hive-site.xml now , mean time if you have any inputs please let me know.
some test
Hello Satyam ,
Thanks for quick help , setting properties from command prompt didn’t worked for me ,
as i mentioned its just hanging for ever .
I tried another way and it is working for me
I have update above all properties in hive-site.xml + one extra property i added
then restarted the service …( just shut down VM and restarted ….) and its working now.
—————————————————————————————————————————————————–my hive-site.xml looks like below
( may be useful for others who may stuck with same error i got)
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
metadata is stored in a MySQL server
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
MySQL JDBC driver class
javax.jdo.option.ConnectionUserName
hiveuser
user name for connecting to mysql server
javax.jdo.option.ConnectionPassword
password
password for connecting to mysql server
hive.support.concurrency
true
hive.enforce.bucketing
true
hive.exec.dynamic.partition.mode
nonstrict
hive.txn.manager
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on
true
hive.compactor.worker.threads
2
hive.in.test
true