
In this blog we will see how to access hbase tables using spark. Spark can work on data present in multiple sources like HDFS,Cassandra,Hbase,MongoDB etc. To get the basic understanding of hbase refer our Beginners guide to Hbase According to Spark documentation , “RDDs can be created from Hadoop...

In this post, we will be performing analysis on the Uber dataset in Apache spark using Scala. The Uber dataset consists of 4 columns. They are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from the below link: https://drive.google.com/open?id=0ByJLBTmJojjzS2c2UktqLW5uRG8 Problem Statement: Find the days on which each basement...

In this post, we will be analyzing the crimes dataset of New York using SparkSQL. In case you are not familiar with SparkSQL, please refer to our post on Introduction to SparkSQL. Dataset Description: This dataset is available publically, reflects the reported incidents of crime (with the exception of...