Comments on: Indexing in Hive

By: Satyam

Satyam — Wed, 27 Jul 2016 13:23:00 +0000

Hi Sachin,

Partitioning divides the larger dataset into smaller ones so that efficiency in processing the query increases. But if you do indexing it won’t divide the dataset into smaller ones, rather it would create another table containing all the details of the table which you are indexed. So when you try to execute any query on an indexed table it will first query on the index_table based on the data in the index it will directly query on the original table. It is just like the index of any text book.

By: sachin

sachin — Thu, 21 Jul 2016 08:10:46 +0000

what is the difference between Partitioning and indexing. In partition we are also divide larger data sets into smaller one and it comes query effectie if you select particular rows from table.

By: chaitanya kulkarni

chaitanya kulkarni — Mon, 04 Jan 2016 19:27:45 +0000

If we are using columnar formats like RC, ORC or formats with built in compression like Avro, Parquet will creating index be helpful?