Category: Spark

14 June 2016

Spark Use Case – Analyzing MovieLens Dataset

In this blog, we will discuss a use case involving MovieLens dataset and try to analyze how the movies fare on a rating scale of 1 to 5. We will start our discussion with the data definition by considering a sample of four records. 196 242 3 881250949 186...

Leave a Comment

08 June 2016

Querying HBase using Apache Spark

In this blog we will see how to access hbase tables using spark. Spark can work on data present in multiple sources like HDFS,Cassandra,Hbase,MongoDB etc. To get the basic understanding of hbase refer our Beginners guide to Hbase According to Spark documentation , “RDDs can be created from Hadoop...

3 Comments

16 May 2016

Spark Use Case – Uber Data Analysis

In this post, we will be performing analysis on the Uber dataset in Apache spark using Scala. The Uber dataset consists of 4 columns. They are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from the below link: https://drive.google.com/open?id=0ByJLBTmJojjzS2c2UktqLW5uRG8 Problem Statement: Find the days on which each basement...

1 Comment

12 May 2016

Integrating SparkSQL with MySQL

In this post, we will be learning how to connect to a JDBC data-source using SparkSQL data frames. In case you are not familiar with SparkSQL, you can refer to this post for a comprehensive Introduction to SparkSQL and the post on Analyzing Crime Data using SparkSQL. We know...

Leave a Comment

08 May 2016

Spark Use Case – Travel Data Analysis

In this blog, we will discuss on the analysis of travel dataset and gain insights from the dataset using Apache Spark. The travel dataset is publically available and the contents are detailed under the heading, ‘Travel Sector Dataset Description’. Based on the data, we will find the top 20...

Leave a Comment

28 April 2016

Analyzing New York Crime Data Using SparkSQL

In this post, we will be analyzing the crimes dataset of New York using SparkSQL. In case you are not familiar with SparkSQL, please refer to our post on Introduction to SparkSQL. Dataset Description: This dataset is available publically, reflects the reported incidents of crime (with the exception of...

Leave a Comment

27 April 2016

Spark RDD Operations in Scala Part – 2

In our previous post, we had discussed about the basic RDD operations in Scala. Now, let’s discuss about some of the advanced RDD operations in Scala. Here we have taken two datasets, dept and emp, to work on this operations. The datasets looks like this: [DeptNo DeptName] [Emp_no...

Leave a Comment

22 April 2016

Spark SQL – Module for Structured Data Processing

In this post, we will be discussing about Spark SQL and and how it is implemented in Spark. We recommend readers to refer the previous posts on Introduction to Spark RDD for the basic understanding of RDD architecture and its operations. Spark SQL is the important component of the Spark...

Leave a Comment

21 April 2016

Spark or Hadoop – Which Big Data Framework You Should Choose!

Hadoop and Spark are the two terms that are frequently discussed among the Big Data professionals. But the big question is whether to choose Hadoop or Spark for Big Data framework. In this blog we will compare both these Big Data technologies, understand their specialties and factors which are...

Leave a Comment

16 April 2016

Spark Use Case – The Daily Show

In this blog we will be taking a famous Tv show dataset i.e., The Daily show and we will be performing analysis on the guests who came to the show. Before going ahead we recommend readers to go through our previous blogs on various publicly available datasets. Youtube Data...

1 Comment

AcadGild

Category: Spark

Spark Use Case – Analyzing MovieLens Dataset

Querying HBase using Apache Spark

Spark Use Case – Uber Data Analysis

Integrating SparkSQL with MySQL

Spark Use Case – Travel Data Analysis

Analyzing New York Crime Data Using SparkSQL

Spark RDD Operations in Scala Part – 2

Spark SQL – Module for Structured Data Processing

Spark or Hadoop – Which Big Data Framework You Should Choose!

Spark Use Case – The Daily Show

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop