Home / Spark / Querying HBase using Apache Spark

08 June 2016

Querying HBase using Apache Spark

In this blog we will see how to access hbase tables using spark.

Spark can work on data present in multiple sources like HDFS,Cassandra,Hbase,MongoDB etc.

To get the basic understanding of hbase refer our Beginners guide to Hbase

According to Spark documentation , “RDDs can be created from Hadoop InputFormats.” InputFormat in the Hadoop are abstraction for anything that can be processed in a MapReduce job. HBase uses a TableInputFormat, it makes easy to use Spark with HBase.

Now we will see the steps for accessing hbase tables through spark.

First start Hbase server

Create an HBASE_PATH environmental variable to store the hbase paths

Start the spark shell by passing HBASE_PATH variable to include all the hbase jars.

Now we have started hbase and spark we will create the connection to hbase through spark shell

Import the required libraries

// create hbase configuration object

// create Admin instance and set input format

//Create table

//Check the create table exists or not

Now we have created the table we will put some data into it

Now we can create the HadoopRDD from the data present in HBase using newAPIHadoopRDD by InputFormat , output key and value class.

We can perform all the transformations and actions on created RDD

We hope this blog helped you in understanding integration of Spark HBase. Keep visiting our site www.acadgild.com for more updates on Big data and other technologies.

Brundesh

Brundesh R currently working at AcadGild as an expert in Big Data domain . He has rich experience in Hadoop, R, Python, Java . He has published several blogs and articles on Hadoop,Spark and have undertaken several projects on Hadoop platform. AcadGild was founded with the vision of "Learn. Do. Earn". We provide skill development courses based on current industry needs. But what sets us apart is earning opportunities we provide after successful completion of course. We also provide live mentoring and 24x7 support. Our mentors are industry thought leaders in their respective fields

Machine Learning with Spark: Determining Credibility of a Customer – Part 2

August 22, 2016
Machine Learning with Spark: Determining Credibility of a Customer – Part 1

August 19, 2016
Introduction to Spark 2.X

August 11, 2016

3 Comments

venu Reply to venu

June 17, 2016 at 4:12 pm

Very practical explanation,
Thanks Brundesh, pls share more tips about spark
revanth kumar reddy Reply to revanth

June 29, 2016 at 9:42 pm

Good example boss , keep posting good stuff.
- revanth kumar reddy Reply to revanth
  
  June 29, 2016 at 10:14 pm
  
  Not able to copy the code ,It would be more helpful if it can be copied ..

AcadGild

Querying HBase using Apache Spark

Related

Brundesh

Related Posts

3 Comments

Leave a Reply to revanth kumar reddy Cancel reply

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop

Share this:

Related

Brundesh

Related Posts

3 Comments

Leave a Reply to revanth kumar reddy Cancel reply