Troubleshooting Hive and Pig errors for beginners - Main

07 May 2016

TroubleShoot Hive and Pig Errors

In this blog we will learn how to solve common errors which may occur while executing hive and pig commands.

We recommend readers to go through our previous blog on Troubleshooting errors in MapReduce before moving ahead.

A beginner will normally have doubts about errors while executing hive and Pig commands in the initial stages, He might not be aware of the type of common errors that may occur and what type of solutions should be used to solve that error.

So, let us know the different types of error which may occur while executing hive and Pig commands and the solution which must be used to solve that error.

Troubleshooting Hive errors

1. Another instance of Derby may have already booted the database :

This type of error will be thrown if the user tries to access more than one instance of Derby database at a time.

Normally, Hive in embedded mode has a limitation of one active user at a time.

As we can see from the above image we have started hive in more than one terminal.

So once the user tries to use hive in more than one terminal an error will be thrown stating that an Another instance of Derby may have already booted the database.

Embedded Apache Derby is used as the default Hive metastore in the Hive configuration. This configuration is called embedded metastore and is good for the sake of development and unit testing, but won’t scale to a production environment as only a single user can connect to the derby database at any instant of time. Starting second instance of the Hive driver will throw a error message.

The solution for the above error can is the user should use one instance of the Derby database at a time.

2. Invalid or No files matching path :

This type of error will be thrown if the user doesn’t specify s the correct path of the file while loading file data into the table.

We can see from the above image an error as been thrown stating “No files matching”. Because the actual file is in hdfs path, we can refer to below image where the file employee.txt is present in hdfs root directory.

We can use cat command to see the contents of employee.txt file.

To solve the above error the user should specify the correct path of the file where it is saved.

We know that the employee.txt file is present in hdfs root path. Let us load the employee.txt file into the table emp_details, without using local command (local refers to local file sytem path) in the load file statement.

We can see from the image employee.txt file contents successfully loaded in the table emp_deatils.

Now we can use select * command to see the contents of the table emp_details.

Thus, from the above execution steps we have learnt the type of error and the solution for the error if we get error as Invalid or No files matching path file.

Trouble shooting common pig errors :

1. Input path does not exist – Hdfs : The first thing whenever a programmer wishes to execute any query in pig shell is, the programmer should decide to in which mode he/she would like to process that query. If the programmer tries to load and dump a relation file which is not present in the specified path then an error will be thrown, stating “input path does not exist”.

When we are in Pig Mapreduce mode the Mapreduce frame work allows user to perform querys on the files which are present only in the Hdfs path.

In this below example, We are purposefully trying to load a file from Local file system path and try to dump it in Pig Mapreduce mode.

We can see in the above image we are loading the file emp.txt which is present in Local file system path into the relation A.

We can see in the above image a error is thrown while Dumping the relation A.

The mapreduce framework considers that the assigned path is already exist in hdfs. i.e, it considers the path /home/acadgild/Desktop/emp.txt is already exist in hdfs. When the relation A is dumped, since the path and file is not available in the hdfs directory the error Input path does not exist will be thrown.

So, The solution for the above error is to specify the correct path of the input file while loading.

Since, we are in Pig Mapreduce mode we should first save the input file emp.txt into the hdfs path and execute the query.

We can see in the below image the emp.txt file now, is present in the hdfs root path.

We can now load the file and use Dump command to see the contents of the file.

We can see in the above image, no exception is thrown after after Dumping the relation A.

Thus, we can see from the above execution steps we have learnt the type of error and the solution for the error if we get error as input file path does not exist.

2. Pig by default loads field files delimited by tab :

When loading an input file into the pig, if the user doesn’t explicitly specify s the field delimited separator of the file, then by default pig considers the input loaded file fields are separated by tab.

When that relation is dumped, “null values are printed” instead of the actual field values.

In the below example we have considered two input files where the file em.txt is separated by tab and the file emp.txt is separated by comma(,).

Let us load the em.txt file without explicitly declaring the using pigstorage( ) field delimited command.

Let us Dump the relation A to see the result.

Since the file em.txt fields were separated by tab(\t) there is no need of using, using pigstorage ( ) field delimited command, therefore relation A successfully stores all the records without any exception.

Now let us load the emp.txt file where the fields are separated by comma(,).

We can see in the above image we are loading emp.txt file without using pigstorage ( ), fields delimited command. Hence, pig considers the loaded file is separated by tab and stores null values in the relation B.

Let us Dump the relation B to see the result.

We can see in the above image we have dumped the relation B, but null values were returned instead of actual values.

The solution for the above error is to use using pigstorage (‘field separator’) command while loading the input file in the pig.

Let us load the file emp.txt using command using pigstorage(‘field separator’) and dump the relation B to see the result.

We can see from the above image actual values are stored instead of null values in the relation B

Thus, we can see from the above execution steps we have learnt when to use “using pigstorage(field separator)” command if the input file fields are separated apart from character tab.

3. Undefined alias :

This type of error will be thrown if a relation is being used which does not exist in the pig query command.

In the above diagram we can see that there is no relation with the name a . Thus, an error is thrown stating “Undefined alias: a”

The solution for the above exception is to use an existing relation in the query.

We can see from the above image, no exception is thrown after using the existing relation A in the query.

Now by using Dump command we see the contents of the relation B.

Thus, we can see from the above execution steps we have learnt the type of error and the solution for the error “Undefined alias relation” name.

4. Projected field does not exist :

This type of error will be thrown if the field name when the specified field does not exist in the schema.

We can see in the above image there is no field with the name sal in the relation A. But, there is a field with the name emp_sal.

Now, let us change the field name sal as emp_sal in the query command to get the filtered result.

We can see in the above image, no exception is thrown after changing the field name as emp_sal as the field emp_sal was already existed in the schema.

Now let us Dump the relation to see the required output.

Thus, we can see from the above execution steps we have learnt the type of error and the solution for the error Projected field does not exist.

5. Successfully stored Zero records :

This type of error will be thrown when a user tries to run an invalid condition on a particular field in the command.

expression(condition) which doesn’t matches with the actual field values in the relation.

We can refer in the below image relation A, emp_sal field doesn’t contain value more than 50000.

Since, user tried an invalid expression(condition) which didn’t matched with the actual field values in the relation hence “Successfully stored 0 records” message is thrown.

The solution for the above error is to use valid expression(condition) while performing an action on the field relation values.

Now let us run the same command by giving the proper condition which matches the field values.