Home / Big Data and Hadoop / Transactions in Hive

30 December 2015

Transactions in Hive

In this blog post, we have explained about the row-level transactions available in Hive. This post will provide you a good idea of how to implement the row-level transactions on the Hive table.

Before beginning with the transactions in Hive, let’s look at the ACID properties, which are vital for any transaction.

What is ACID?

ACID stands for Atomicity, Consistency, Isolation, and Durability.

Atomicity means, a transaction should complete successfully or else it should fail completely i.e. it should not be left partially. Consistency ensures that any transaction will bring the database from one valid state to another state. Isolation states that every transaction should be independent of each other i.e. one transaction should not affect another. And Durability states that if a transaction is completed, it should be preserved in the database even if the machine state is lost or a system failure might occur.

These ACID properties are essential for a transaction and every transaction should ensure that these properties are met.

Transactions in Hive

Transactions in Hive are introduced in Hive 0.13, but they only partially fulfill the ACID properties like atomicity, consistency, durability, at the partition level. Here, Isolation can be provided by turning on one of the locking mechanisms available with zookeeper or in memory.

But in Hive 0.14, new API’s have been added to completely fulfill the ACID properties while performing any transaction.

Transactions are provided at the row-level in Hive 0.14. The different row level transactions available in Hive 0.14 are as follows:

Insert
Delete
Update

There are numerous limitations with the present transactions available in Hive 0.14. ORC is the file format supported by Hive transaction. It is now essential to have ORC file format for performing transactions in Hive. The table needs to be bucketed in order to support transactions.

Row-level Transactions Available in Hive 0.14

Let’s perform some row-level transactions available in Hive 0.14. Before creating a Hive table that supports transactions, the transaction features present in Hive needs to be turned on, as by default they are turned off.

The below properties needs to be set appropriately in hive shell , order-wise to work with transactions in Hive:

hive>set hive.support.concurrency = true;

hive>set hive.enforce.bucketing = true;

hive>set hive.exec.dynamic.partition.mode = nonstrict;

hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

hive>set hive.compactor.initiator.on = true;

hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;

If the above properties are not set properly, the ‘Insert’ operation will work but ‘Update’ and ‘Delete’ will not work and you will receive the following error:

1	FAILED: SemanticException [Error 10294]: Attempt to do update or delete usingtransaction manager thatdoes not support these operations.

Creating a Table That Supports Hive Transactions

1	CREATE TABLE college(clg_id int,clg_name string,clg_loc string) clustered by (clg_id) into 5 buckets stored as orc TBLPROPERTIES('transactional'='true');

The above syntax will create a table with name ‘college’ and the columns present in the table are ‘clg_id, clg_name, clg_loc’. We are bucketing the table by ‘clg_id’ and the table format is ‘orc’, also we are enabling the transactions in the table by specifying it inside the TBLPROPERTIES as ‘transactional’=’true’

We have successfully created a table with name ‘college’ which supports row-level transactions of Hive.

The create table can be checked using the command show tables.

Inserting Data into a Hive Table

1	INSERT INTO table college values(1,'nec','nlr'),(2,'vit','vlr'),(3,'srm','chen'),(4,'lpu','del'),(5,'stanford','uk'),(6,'JNTUA','atp'),(7,'cambridge','us');

The above command is used to insert row wise data into the Hive table. Here, each row is seperated by ‘( )’ brackets.

Now, we have successfully inserted the data into the Hive table.

The contents of the table can be viewed using the command select * from college

From the above image, we can see that the data has been inserted successfully into the table.

Now if we try to re-insert the same data again, it will be appended to the previous data as shown below:

Updating the Data in Hive Table

1	UPDATE college set clg_id = 8 where clg_id = 7;

The above command is used to update a row in Hive table.

From the above image, we can see that we have received an error message. This means that the Update command is not supported on the columns that are bucketed.

In this table, we have bucketed the ‘clg_id’ column and performing the Update operation on the same column, so we have go the error

FAILED: SemanticException[Error 10302]: Updating values of bucketing columns is not supported. Column clg_id

Now let’s perform the update operation on Non bucketed column

1	UPDATE college set clg_name = 'IIT' where clg_id = 6;

We have successfully updated the data.

The updated data can be checked using the command select * from college.

We can see that the data has been updated successfully.

Now let’s perform the Delete operation on the same table.

Deleting a Row from Hive Table

1	delete from college where clg_id=5;

The above command will delete a single row in the Hive table.

We have now successfully deleted a row from the Hive table. This can be checked using the command select * from college.

We can see that there is no row with clg_id =1. This means that we have successfully deleted the row from the Hive table.

This is how the transactions or row-wise operations are performed in Hive.

Follow our blog to stay updated of our latest blog posts on Big Data technologies.

Got a question for us? Please mention them in the comments section and we will address them as soon as possible.

Kiran Krishna

Kiran Krishna Innamuri is a Passionate Big Data enthusiast having expertise in Hadoop and Spark Development. He is a passionate Java and scala programmer. AcadGild was founded with the vision of "Learn. Do. Earn". We provide skill development courses based on current industry needs. But what sets us apart is earning opportunities we provide after successful completion of course. We also provide live mentoring and 24x7 support. Our mentors are industry thought leaders in their respective fields. We provide courses for Android Programming, Big Data, Front End, Full Stack, AngularJS, NodeJS and Android Programming for children.

Hadoop Tutorial: Combiners in Hadoop

August 25, 2016
Hadoop Tutorial: HBase Admin DDL Commands (Java API)

August 24, 2016
Machine Learning with Spark – Part 3

August 23, 2016

7 Comments

pratheep Reply to pratheep

January 10, 2016 at 11:43 pm

Tanks a lot for your information its very useful and can you provide any information regarding for apache oozie with real time scenario?
Guarav Reply to Guarav

February 3, 2016 at 11:15 am

When I am inserting a row into my table my API hangs at kill command, can you please help me out with this ?
Amol Vishwarupe Reply to Amol

June 30, 2016 at 11:26 am

HEllo Kiran

after setting the below properties
hive>set hive.support.concurrency = true;

hive>set hive.enforce.bucketing = true;

hive>set hive.exec.dynamic.partition.mode = nonstrict;

hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

hive>set hive.compactor.initiator.on = true;

hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;

Its getting hanged and no operation can be done after that ……can you please help on it
I am using acadgild VM on 64 bit
- Satyam Reply to Satyam
  
  June 30, 2016 at 11:55 am
  
  Hi Amol,
  set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
  For this you need to give any positive value for example
  set hive.compactor.worker.threads = 5;
Amol Vishwarupe Reply to Amol

June 30, 2016 at 12:24 pm

I did that i have given 2

after setting all those properties whatever next hive statement i give it hangs for ever.

i am trying win hive-site.xml now , mean time if you have any inputs please let me know.
Amol Vishwarupe Reply to Amol

June 30, 2016 at 12:24 pm

some test
Amol Vishwarupe Reply to Amol

June 30, 2016 at 12:48 pm

Hello Satyam ,
Thanks for quick help , setting properties from command prompt didn’t worked for me ,
as i mentioned its just hanging for ever .

I tried another way and it is working for me

I have update above all properties in hive-site.xml + one extra property i added

then restarted the service …( just shut down VM and restarted ….) and its working now.

—————————————————————————————————————————————————–my hive-site.xml looks like below
( may be useful for others who may stuck with same error i got)

javax.jdo.option.ConnectionURL
jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
metadata is stored in a MySQL server

javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
MySQL JDBC driver class

javax.jdo.option.ConnectionUserName
hiveuser
user name for connecting to mysql server

javax.jdo.option.ConnectionPassword
password
password for connecting to mysql server

hive.support.concurrency
true

hive.enforce.bucketing
true

hive.exec.dynamic.partition.mode
nonstrict

hive.txn.manager
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

hive.compactor.initiator.on
true

hive.compactor.worker.threads
2

hive.in.test
true

AcadGild

Transactions in Hive

What is ACID?

Transactions in Hive

Row-level Transactions Available in Hive 0.14

Creating a Table That Supports Hive Transactions

Inserting Data into a Hive Table

Updating the Data in Hive Table

Deleting a Row from Hive Table

Related

Kiran Krishna

Related Posts

7 Comments

Leave a Reply

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop

What is ACID?

Transactions in Hive

Row-level Transactions Available in Hive 0.14

Creating a Table That Supports Hive Transactions

Inserting Data into a Hive Table

Updating the Data in Hive Table

Deleting a Row from Hive Table

Share this:

Related

Kiran Krishna

Related Posts

7 Comments

Leave a Reply