Spark – AcadGild https://acadgild.com/blog Learn. Do. Earn. Thu, 25 Aug 2016 12:01:42 +0000 en-US hourly 1 https://wordpress.org/?v=4.5.3 103159356 Machine Learning with Spark: Determining Credibility of a Customer – Part 2 https://acadgild.com/blog/machine-learning-with-spark-determining-credibility-of-a-customer-part-2/ https://acadgild.com/blog/machine-learning-with-spark-determining-credibility-of-a-customer-part-2/#respond Mon, 22 Aug 2016 14:43:33 +0000 https://acadgild.com/blog/?p=15500 DataFrame: A DataFrame is a new feature that has been exposed as an API from Spark 1.3.0. A DataFrame is a distributed storage of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrame as an […]

The post Machine Learning with Spark: Determining Credibility of a Customer – Part 2 appeared first on AcadGild.

]]>
https://acadgild.com/blog/machine-learning-with-spark-determining-credibility-of-a-customer-part-2/feed/ 0 15500
Machine Learning with Spark: Determining Credibility of a Customer – Part 1 https://acadgild.com/blog/machine-learning-spark-determining-credibility-customer-part-1/ https://acadgild.com/blog/machine-learning-spark-determining-credibility-customer-part-1/#respond Fri, 19 Aug 2016 09:52:47 +0000 https://acadgild.com/blog/?p=15292 This is a first part of the series of posts, which will outline the importance of Spark in solving Machine Learning problems. This series covers complete steps that are necessary for a Data Science project. This series of steps commonly known as data pipeline in the industry consists of various data transformations and evolutions, each […]

The post Machine Learning with Spark: Determining Credibility of a Customer – Part 1 appeared first on AcadGild.

]]>
https://acadgild.com/blog/machine-learning-spark-determining-credibility-customer-part-1/feed/ 0 15292
Introduction to Spark 2.X https://acadgild.com/blog/introduction-spark-2-x/ https://acadgild.com/blog/introduction-spark-2-x/#respond Wed, 10 Aug 2016 19:12:57 +0000 https://acadgild.com/blog/?p=14461 In this post, we will be discussing the new features of Spark 2.0.0 and its installation in Hadoop 2.7. We highly recommend our readers to go through the below posts on Spark, to get a clear idea of what Spark is and the reasons behind its popularity. Beginner’s Guide for Spark Spark RDD’s in Scala […]

The post Introduction to Spark 2.X appeared first on AcadGild.

]]>
https://acadgild.com/blog/introduction-spark-2-x/feed/ 0 14461
Social Media Analysis Using Apache Flink https://acadgild.com/blog/social-media-analysis-using-apache-flink/ https://acadgild.com/blog/social-media-analysis-using-apache-flink/#respond Sun, 31 Jul 2016 05:24:44 +0000 https://acadgild.com/blog/?p=13256 In this post, we will be looking at a case study to calculate the average number of friends based on their age, on a social media website using Apache Flink in Scala. In our previous post, we had a brief introduction to Flink. Hence, we request you to go through that first, before going through […]

The post Social Media Analysis Using Apache Flink appeared first on AcadGild.

]]>
https://acadgild.com/blog/social-media-analysis-using-apache-flink/feed/ 0 13256
Beginner’s Guide for Apache Flink https://acadgild.com/blog/beginners-guide-apache-flink/ https://acadgild.com/blog/beginners-guide-apache-flink/#respond Fri, 29 Jul 2016 08:20:58 +0000 https://acadgild.com/blog/?p=13103 In this post, we will be discussing Apache Flink, its installation in a single node cluster and how it is a contender for the present Big Data frameworks. Let’s begin with the basics. What is Apache Flink? Apache Flink is an open-source platform for distributed stream and batch data processing. Flink’s core is a streaming […]

The post Beginner’s Guide for Apache Flink appeared first on AcadGild.

]]>
https://acadgild.com/blog/beginners-guide-apache-flink/feed/ 0 13103
Streaming Twitter Data using Spark https://acadgild.com/blog/streaming-twitter-data-using-spark/ https://acadgild.com/blog/streaming-twitter-data-using-spark/#respond Tue, 26 Jul 2016 08:46:45 +0000 https://acadgild.com/blog/?p=12673 In this post, we will be discussing how to stream Twitter data using Spark Streaming. Let’s begin with what Spark Streaming is. Before going to spark streaming, we recommend our users to get some idea on Spark core and RDD’s. Spark RDD’s in Scala part-1 Spark RDD’s in Scala part-2 Spark Streaming Spark Streaming is […]

The post Streaming Twitter Data using Spark appeared first on AcadGild.

]]>
https://acadgild.com/blog/streaming-twitter-data-using-spark/feed/ 0 12673
Spark Use Case – Popular Movie Analysis https://acadgild.com/blog/spark-use-case-popular-movie-analysis/ https://acadgild.com/blog/spark-use-case-popular-movie-analysis/#respond Wed, 20 Jul 2016 09:52:37 +0000 https://acadgild.com/blog/?p=12149 In this blog, we will work on a case study to find the list of most popular movies. We will perform various transformations and actions to display a list of movies with maximum occurrence in the given data set. Let’s  start our discussion with the data definition by considering a sample of four records. 196 […]

The post Spark Use Case – Popular Movie Analysis appeared first on AcadGild.

]]>
https://acadgild.com/blog/spark-use-case-popular-movie-analysis/feed/ 0 12149
Spark Use Case – Weather Data Analysis https://acadgild.com/blog/spark-use-case-weather-data-analysis/ https://acadgild.com/blog/spark-use-case-weather-data-analysis/#respond Wed, 06 Jul 2016 07:04:35 +0000 https://acadgild.com/blog/?p=10232 In this post, we will work on a case study to find the minimum temperature observed in a given weather station in a particular year. Let’s begin by considering a sample of four records. Data Definition: Column 1: Weather Station Column 2: Date(year/Month/Day) Column 3: Observation Type Column 4: Temperature You can download the input […]

The post Spark Use Case – Weather Data Analysis appeared first on AcadGild.

]]>
https://acadgild.com/blog/spark-use-case-weather-data-analysis/feed/ 0 10232
Spark Use Case – Social Media Analysis https://acadgild.com/blog/spark-use-case-social-media-analysis/ https://acadgild.com/blog/spark-use-case-social-media-analysis/#respond Thu, 23 Jun 2016 09:48:32 +0000 https://acadgild.com/blog/?p=8151 In this post, we will work on a case study to calculate the average number of friends based on their age, on a social media website. Let’s begin by considering a sample of four records. Column 1: User ID Column 2: User Name Column 3: Age of the User Column 4: Number of Friends with […]

The post Spark Use Case – Social Media Analysis appeared first on AcadGild.

]]>
https://acadgild.com/blog/spark-use-case-social-media-analysis/feed/ 0 8151
Implementing Custom Input Format in Spark https://acadgild.com/blog/custom-input-format-spark/ https://acadgild.com/blog/custom-input-format-spark/#respond Fri, 17 Jun 2016 11:10:52 +0000 https://acadgild.com/blog/?p=7473 In this post, we will be discussing how to implement custom input format in Spark. In Spark, we will implement the custom input format by using Hadoop custom input format. You can refer to our previous post to get an idea of how custom input format has been implemented on the Titanic Dataset. Problem Statement: […]

The post Implementing Custom Input Format in Spark appeared first on AcadGild.

]]>
https://acadgild.com/blog/custom-input-format-spark/feed/ 0 7473