Category: Spark

22 August 2016

Machine Learning with Spark: Determining Credibility of a Customer – Part 2

DataFrame: A DataFrame is a new feature that has been exposed as an API from Spark 1.3.0. A DataFrame is a distributed storage of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations...

Leave a Comment

19 August 2016

Machine Learning with Spark: Determining Credibility of a Customer – Part 1

This is a first part of the series of posts, which will outline the importance of Spark in solving Machine Learning problems. This series covers complete steps that are necessary for a Data Science project. This series of steps commonly known as data pipeline in the industry consists of...

Leave a Comment

11 August 2016

Introduction to Spark 2.X

In this post, we will be discussing the new features of Spark 2.0.0 and its installation in Hadoop 2.7. We highly recommend our readers to go through the below posts on Spark, to get a clear idea of what Spark is and the reasons behind its popularity. Beginner’s Guide...

Leave a Comment

31 July 2016

Social Media Analysis Using Apache Flink

In this post, we will be looking at a case study to calculate the average number of friends based on their age, on a social media website using Apache Flink in Scala. In our previous post, we had a brief introduction to Flink. Hence, we request you to go...

Leave a Comment

29 July 2016

Beginner’s Guide for Apache Flink

In this post, we will be discussing Apache Flink, its installation in a single node cluster and how it is a contender for the present Big Data frameworks. Let’s begin with the basics. What is Apache Flink? Apache Flink is an open-source platform for distributed stream and batch data...

Leave a Comment

26 July 2016

Streaming Twitter Data using Spark

In this post, we will be discussing how to stream Twitter data using Spark Streaming. Let’s begin with what Spark Streaming is. Before going to spark streaming, we recommend our users to get some idea on Spark core and RDD’s. Spark RDD’s in Scala part-1 Spark RDD’s in Scala...

Leave a Comment

20 July 2016

Spark Use Case – Popular Movie Analysis

In this blog, we will work on a case study to find the list of most popular movies. We will perform various transformations and actions to display a list of movies with maximum occurrence in the given data set. Let’s start our discussion with the data definition by considering...

Leave a Comment

06 July 2016

Spark Use Case – Weather Data Analysis

In this post, we will work on a case study to find the minimum temperature observed in a given weather station in a particular year. Let’s begin by considering a sample of four records. Data Definition: Column 1: Weather Station Column 2: Date(year/Month/Day) Column 3: Observation Type Column 4:...

Leave a Comment

23 June 2016

Spark Use Case – Social Media Analysis

In this post, we will work on a case study to calculate the average number of friends based on their age, on a social media website. Let’s begin by considering a sample of four records. Column 1: User ID Column 2: User Name Column 3: Age of the User...

Leave a Comment

17 June 2016

Implementing Custom Input Format in Spark

In this post, we will be discussing how to implement custom input format in Spark. In Spark, we will implement the custom input format by using Hadoop custom input format. You can refer to our previous post to get an idea of how custom input format has been implemented...

Leave a Comment

AcadGild

Category: Spark

Machine Learning with Spark: Determining Credibility of a Customer – Part 2

Machine Learning with Spark: Determining Credibility of a Customer – Part 1

Introduction to Spark 2.X

Social Media Analysis Using Apache Flink

Beginner’s Guide for Apache Flink

Streaming Twitter Data using Spark

Spark Use Case – Popular Movie Analysis

Spark Use Case – Weather Data Analysis

Spark Use Case – Social Media Analysis

Implementing Custom Input Format in Spark

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop