Big Data and Hadoop – AcadGild

Hadoop Tutorial: Combiners in Hadoop

Manjunath N — Thu, 25 Aug 2016 09:40:35 +0000

In this post, we will be looking into Combiners, and discuss the need and their functionality in Hadoop. We know that Hadoop is an open-source framework, which is used to store and process large data sets in a distributed computing environment. Usually, when we are working on large data sets in MapReduce framework, the output […]

The post Hadoop Tutorial: Combiners in Hadoop appeared first on AcadGild.

Hadoop Tutorial: HBase Admin DDL Commands (Java API)

Prateek Kumar — Wed, 24 Aug 2016 07:14:18 +0000

In HBase, all the operations are done on tables and are very similar to MySQL. In this post, we will be using eclipse as IDE for Java programs. Before going ahead with this post on DML commands in HBase, we request you to go through this post, HBase CRUD Operations. We will be learning some […]

The post Hadoop Tutorial: HBase Admin DDL Commands (Java API) appeared first on AcadGild.

Machine Learning with Spark – Part 3

Manjunath N — Tue, 23 Aug 2016 13:11:54 +0000

In this post, we will be working on a dataset from a bank and try to find some patterns using Exploratory Data Analysis. Before we go ahead let’s get acquainted with the data set. Dataset: Source Information Professor Dr. Hans Hofmann Institut f”ur Statistik und “Okonometrie Universit”at Hamburg FB Wirtschaftswissenschaften Von-Melle-Park 5 2000 Hamburg 13 […]

The post Machine Learning with Spark – Part 3 appeared first on AcadGild.

Building a Hadoop Application using Maven

Kiran Krishna — Tue, 23 Aug 2016 07:33:26 +0000

In this post, we will be discussing how to build a Hadoop application using Maven. We recommend our readers to go through the previous post on Maven to get a clear idea of Maven and how it helps in building applications. Eclipse needs to be installed in your system for this. Assuming that it is […]

The post Building a Hadoop Application using Maven appeared first on AcadGild.

Machine Learning with Spark: Determining Credibility of a Customer – Part 1

Satyam — Fri, 19 Aug 2016 09:52:47 +0000

This is a first part of the series of posts, which will outline the importance of Spark in solving Machine Learning problems. This series covers complete steps that are necessary for a Data Science project. This series of steps commonly known as data pipeline in the industry consists of various data transformations and evolutions, each […]

The post Machine Learning with Spark: Determining Credibility of a Customer – Part 1 appeared first on AcadGild.

Frequently Asked Hadoop Interview Questions – Part 1

Satyam — Wed, 17 Aug 2016 14:40:43 +0000

In this first Part of Hadoop interview Questions, we would be discussing various questions related to Big Data Hadoop Ecosystem. We have given relevant posts with most of the questions which you can refer for practical implementation. What are the different types of File formats in hive? Ans. Different file formats which Hive can handle […]

The post Frequently Asked Hadoop Interview Questions – Part 1 appeared first on AcadGild.

Implementing HBase filters using Java APIs

Manjunath N — Wed, 10 Aug 2016 10:00:36 +0000

In our previous blog we discussed about Need and Working of Filters in HBase. In this, blog we will be implementing a filtering operation on a set of rows in a HBase table. We also recommend readers to go through our our below posts on HBase as it would help them in understanding the concepts […]

The post Implementing HBase filters using Java APIs appeared first on AcadGild.

Mapreduce Use Case – Sentiment Analysis on Twitter Data

Kiran Krishna — Mon, 08 Aug 2016 18:27:12 +0000

This post is about performing Sentiment Analysis on Twitter data using Map Reduce. We will use the concept of distributed cache to implement Sentiment Analysis on Twitter data. What does distributed cache do here? By using distributed cache, we can perform map side joins. So, here we will join the dictionary dataset containing the sentiment […]

The post Mapreduce Use Case – Sentiment Analysis on Twitter Data appeared first on AcadGild.

MapReduce Use Case – Uber Data Analysis

Kiran Krishna — Thu, 04 Aug 2016 07:15:24 +0000

In this post, we will be performing analysis on the Uber dataset in Hadoop using MapReduce in Java. The Uber dataset consists of four columns; they are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from here. Problem Statement 1: In this problem statement, we will find the days on which each basement […]

The post MapReduce Use Case – Uber Data Analysis appeared first on AcadGild.

Ambari Installation Guide- Part II

Onkar Singh — Thu, 04 Aug 2016 06:00:22 +0000

In this post, we will be discussing cluster setup in Ambari cluster using AWS EC2 instance. Before moving ahead in this post we recommend readers to go through our previous post on how Ambari works and steps to install it using a repository. Setting up Ambari Cluster and It’s Services: Before starting all the Ambari […]

The post Ambari Installation Guide- Part II appeared first on AcadGild.