Big Data and Hadoop – AcadGild https://acadgild.com/blog Learn. Do. Earn. Thu, 25 Aug 2016 12:01:42 +0000 en-US hourly 1 https://wordpress.org/?v=4.5.3 103159356 Hadoop Tutorial: Combiners in Hadoop https://acadgild.com/blog/hadoop-tutorial-combiners-in-hadoop/ https://acadgild.com/blog/hadoop-tutorial-combiners-in-hadoop/#respond Thu, 25 Aug 2016 09:40:35 +0000 https://acadgild.com/blog/?p=15768 In this post, we will be looking into Combiners, and discuss the need and their functionality in Hadoop. We know that Hadoop is an open-source framework, which is used to store and process large data sets in a distributed computing environment. Usually, when we are working on large data sets in MapReduce framework, the output […]

The post Hadoop Tutorial: Combiners in Hadoop appeared first on AcadGild.

]]>
https://acadgild.com/blog/hadoop-tutorial-combiners-in-hadoop/feed/ 0 15768
Hadoop Tutorial: HBase Admin DDL Commands (Java API) https://acadgild.com/blog/hadoop-tutorial-hbase-admin-ddl-commands-java-api/ https://acadgild.com/blog/hadoop-tutorial-hbase-admin-ddl-commands-java-api/#respond Wed, 24 Aug 2016 07:14:18 +0000 https://acadgild.com/blog/?p=15662 In HBase, all the operations are done on tables and are very similar to MySQL. In this post, we will be using eclipse as IDE for Java programs. Before going ahead with this post on DML commands in HBase, we request you to go through this post, HBase CRUD Operations. We will be learning some […]

The post Hadoop Tutorial: HBase Admin DDL Commands (Java API) appeared first on AcadGild.

]]>
https://acadgild.com/blog/hadoop-tutorial-hbase-admin-ddl-commands-java-api/feed/ 0 15662
Machine Learning with Spark – Part 3 https://acadgild.com/blog/machine-learning-spark-part-3/ https://acadgild.com/blog/machine-learning-spark-part-3/#respond Tue, 23 Aug 2016 13:11:54 +0000 https://acadgild.com/blog/?p=15764 In this post, we will be working on a dataset from a bank and try to find some patterns using Exploratory Data Analysis. Before we go ahead let’s get acquainted with the data set. Dataset: Source Information Professor Dr. Hans Hofmann Institut f”ur Statistik und “Okonometrie Universit”at Hamburg FB Wirtschaftswissenschaften Von-Melle-Park 5 2000 Hamburg 13 […]

The post Machine Learning with Spark – Part 3 appeared first on AcadGild.

]]>
https://acadgild.com/blog/machine-learning-spark-part-3/feed/ 0 15764
Building a Hadoop Application using Maven https://acadgild.com/blog/building-a-hadoop-application-using-maven/ https://acadgild.com/blog/building-a-hadoop-application-using-maven/#respond Tue, 23 Aug 2016 07:33:26 +0000 https://acadgild.com/blog/?p=15580 In this post, we will be discussing how to build a Hadoop application using Maven. We recommend our readers to go through the previous post on Maven to get a clear idea of Maven and how it helps in building applications. Eclipse needs to be installed in your system for this. Assuming that it is […]

The post Building a Hadoop Application using Maven appeared first on AcadGild.

]]>
https://acadgild.com/blog/building-a-hadoop-application-using-maven/feed/ 0 15580
Machine Learning with Spark: Determining Credibility of a Customer – Part 1 https://acadgild.com/blog/machine-learning-spark-determining-credibility-customer-part-1/ https://acadgild.com/blog/machine-learning-spark-determining-credibility-customer-part-1/#respond Fri, 19 Aug 2016 09:52:47 +0000 https://acadgild.com/blog/?p=15292 This is a first part of the series of posts, which will outline the importance of Spark in solving Machine Learning problems. This series covers complete steps that are necessary for a Data Science project. This series of steps commonly known as data pipeline in the industry consists of various data transformations and evolutions, each […]

The post Machine Learning with Spark: Determining Credibility of a Customer – Part 1 appeared first on AcadGild.

]]>
https://acadgild.com/blog/machine-learning-spark-determining-credibility-customer-part-1/feed/ 0 15292
Frequently Asked Hadoop Interview Questions – Part 1 https://acadgild.com/blog/frequently-asked-hadoop-interview-questions-part-1/ https://acadgild.com/blog/frequently-asked-hadoop-interview-questions-part-1/#respond Wed, 17 Aug 2016 14:40:43 +0000 https://acadgild.com/blog/?p=15242 In this first Part of Hadoop interview Questions, we would be discussing various questions related to Big Data Hadoop Ecosystem. We have given relevant posts with most of the questions which you can refer for practical implementation. What are the different types of File formats in hive? Ans. Different file formats which Hive can handle […]

The post Frequently Asked Hadoop Interview Questions – Part 1 appeared first on AcadGild.

]]>
https://acadgild.com/blog/frequently-asked-hadoop-interview-questions-part-1/feed/ 0 15242
Implementing HBase filters using Java APIs https://acadgild.com/blog/implementing-hbase-filters-using-java-apis/ https://acadgild.com/blog/implementing-hbase-filters-using-java-apis/#respond Wed, 10 Aug 2016 10:00:36 +0000 https://acadgild.com/blog/?p=14258 In our previous blog we discussed about Need and Working of Filters in HBase. In this, blog we will be implementing a filtering operation on a set of rows in a HBase table. We also recommend readers to go through our our below posts on HBase as it would help them in understanding the concepts […]

The post Implementing HBase filters using Java APIs appeared first on AcadGild.

]]>
https://acadgild.com/blog/implementing-hbase-filters-using-java-apis/feed/ 0 14258
Mapreduce Use Case – Sentiment Analysis on Twitter Data https://acadgild.com/blog/mapreduce-use-case-sentiment-analysis-twitter-data/ https://acadgild.com/blog/mapreduce-use-case-sentiment-analysis-twitter-data/#comments Mon, 08 Aug 2016 18:27:12 +0000 https://acadgild.com/blog/?p=14180 This post is about performing Sentiment Analysis on Twitter data using Map Reduce. We will use the concept of distributed cache to implement Sentiment Analysis on Twitter data. What does distributed cache do here? By using distributed cache, we can perform map side joins. So, here we will join the dictionary dataset containing the sentiment […]

The post Mapreduce Use Case – Sentiment Analysis on Twitter Data appeared first on AcadGild.

]]>
https://acadgild.com/blog/mapreduce-use-case-sentiment-analysis-twitter-data/feed/ 2 14180
MapReduce Use Case – Uber Data Analysis https://acadgild.com/blog/mapreduce-use-case-uber-data-analysis/ https://acadgild.com/blog/mapreduce-use-case-uber-data-analysis/#comments Thu, 04 Aug 2016 07:15:24 +0000 https://acadgild.com/blog/?p=13515 In this post, we will be performing analysis on the Uber dataset in Hadoop using MapReduce in Java. The Uber dataset consists of four columns; they are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from here. Problem Statement 1: In this problem statement, we will find the days on which each basement […]

The post MapReduce Use Case – Uber Data Analysis appeared first on AcadGild.

]]>
https://acadgild.com/blog/mapreduce-use-case-uber-data-analysis/feed/ 4 13515
Ambari Installation Guide- Part II https://acadgild.com/blog/ambari-installation-guide-part-ii/ https://acadgild.com/blog/ambari-installation-guide-part-ii/#respond Thu, 04 Aug 2016 06:00:22 +0000 https://acadgild.com/blog/?p=12191 In this post, we will be discussing cluster setup in Ambari cluster using AWS EC2 instance. Before moving ahead in this post we recommend readers to go through our previous post on how Ambari works and steps to install it using a repository. Setting up Ambari Cluster and It’s Services: Before starting all the Ambari […]

The post Ambari Installation Guide- Part II appeared first on AcadGild.

]]>
https://acadgild.com/blog/ambari-installation-guide-part-ii/feed/ 0 12191