In this blog we will see how to access hbase tables using spark.
Spark can work on data present in multiple sources like HDFS,Cassandra,Hbase,MongoDB etc.
To get the basic understanding of hbase refer our Beginners guide to Hbase
According to Spark documentation , “RDDs can be created from Hadoop InputFormats.” InputFormat in the Hadoop are abstraction for anything that can be processed in a MapReduce job. HBase uses a TableInputFormat, it makes easy to use Spark with HBase.
Now we will see the steps for accessing hbase tables through spark.
First start Hbase server
Create an HBASE_PATH environmental variable to store the hbase paths
Start the spark shell by passing HBASE_PATH variable to include all the hbase jars.
Now we have started hbase and spark we will create the connection to hbase through spark shell
Import the required libraries
// create hbase configuration object
// create Admin instance and set input format
//Create table
//Check the create table exists or not
Now we have created the table we will put some data into it
Now we can create the HadoopRDD from the data present in HBase using newAPIHadoopRDD by InputFormat , output key and value class.
We can perform all the transformations and actions on created RDD
We hope this blog helped you in understanding integration of Spark HBase. Keep visiting our site www.acadgild.com for more updates on Big data and other technologies.
Very practical explanation,
Thanks Brundesh, pls share more tips about spark
Good example boss , keep posting good stuff.
Not able to copy the code ,It would be more helpful if it can be copied ..