In this blog we will be discussing about how to install oozie in hadoop 2.x cluster.
First we need to download the oozie-4.1.0 tar file from the below link:
By default it will be downloaded in the Downloads folder.
We need to move into the Downloads folder using the below commands:
cd
cd Downloads
We need to extract the tar file using the below command:
1 |
tar -xzvf oozie-4.1.0.tar.gz |
The tar file will be extracted and you will get oozie-4.1.0 file
Maven Installation
Before setting up the things for oozie install maven in your system as by using maven, oozie download the dependencies required for your hadoop cluster based on the hadoop’s version.
If you are using Centos type the below command to install maven:
Command:yum install maven
If you are using Ubuntu type the below command to install maven:
Command:sudo apt-get install maven
After the installation of maven check the installed maven by using the below command
mvn -version
You must get the output as shown in the below screen shot
Oozie distro creation
Now open the untared oozie-4.1.0 file and open the pom.xml file
In the pom.xml file update the target version of java as your java version. Here we are using Java7. So we have updated the target version as 1.7
If you are using Hadoop 2.x update the hadoop version as 2.3 so that by using maven, oozie will refer the dependencies that are required to run it on hadoop 2.x cluster, hadoop 2.3 dependencies are the latest one which oozie has added.
Now comment the codehaus repository, because codehaus has stopped its services recently. So dependencies won’t be downloaded from this repository.
After making the above specified changes, save and close the file.
Now move into the untared oozie-4.1.0 bin folder
and then type the below command:
1 |
./mkdistro.sh -DskipTests -X |
The above command will run the disto, and prepares a distro file by skipping the Tests by Debugging
Note: distro command will download the dependencies from maven that are required for hadoop2.x cluster that required for oozie.
The process will take some time, it will download all the depedencies required for your project.
While making the distro file the you will get some dots as shown below, don’t panic at that time.
Finally you will get a success message as shown in the below figure.
A target file will be created in the distro folder of your oozie directory.
Now open the file target file inside distro folder
Inside the targer folder you can see the oozie-4.1.0-distro folder
Open the oozie-4.1.0-distro folder, inside you will find oozie-4.1.0 folder
This is the oozie-4.1.0 folder which consists of all the dependencies that are required to run in a hadoop cluster.
Copy this oozie-4.1.0 folder into your hadoop user, in our case we are making a oozie directory in home folder($HOME) and then paste the obtained oozie-4.1.0 folder in the path $HOME/oozie
Now change the path to newly obtained oozie-4.1.0 directory, create a directory with name libext(library extension) using the command mkdir libext.
In the below screenshot we can see that libext directory has been created in the path $HOME/oozie/oozie-4-1.0
Move into the libext directory using the command cd libext
Now copy the jar files of Hadoop-2.3.0 into the newly created libext folder. You can find the libraries of Hadoop-2.3.0 in the following path.
oozie-4.1.0–>hadooplibs–>hadoop-2–>target–>hadooplibs–>hadooplib-2.3.0.oozie-4.1.0–>
Please refer the below screen shot for the same.
Copy the jar files inside hadooplib-2.3.0.oozie-4.1.0 to the newly created libext folder
Now download the the ext-2.2 zip file from the below link
Copy this downloaded ext-2.2.zip file into the newly created libext folder
This ext-2.2.zip file is required for WebUI.
Refer the below screen shot to see the presence of hadooplib-2.3.0.oozie-4.1.0 jar files and ext-2.2.zip file inside the libext folder.
Now after setting up the things, move into the bin folder of newly obtained oozie-4.1.0 in the path $HOME/oozie/oozie-4.1.0/
oozie-4.1.0/bin
Preparing a War file
Now prepare a war file by using the below command
1 |
sudo ./oozie-setup.sh prepare-war |
The above command will prepare a war file for oozie.
After the successfull preparation of war file, you will get the output as shown in the below image.
Now, open the core-site.xml file in your hadoop’s etc folder and add the below properties.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.proxyuser.hadoop_user_name.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop_user_name..groups</name> <value>*</value> </property> |
After doing the changes, save and close the file.
Now open the oozie-site.xml file present in the newly obtained oozie-4.1.0’s conf directory.
In the oozie-site.xml file edit the below specified properties
In the oozie.service.HadoopAccessorService.hadoop.configurations, specify your hadoop configurations directory path.Please refer the below for the same
1 2 3 4 5 6 7 8 9 10 11 12 |
<property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/home/kiran/hadoop-2.7.1/etc/hadoop</value> <description> Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is used when there is no exact match for an authority. The HADOOP_CONF_DIR contains the relevant Hadoop *-site.xml files. If the path is relative is looked within the Oozie configuration directory; though the path can be absolute (i.e. to point to Hadoop client conf/ directories in the local filesystem. </description> </property> |
In the oozie.service.workflowAppservice.system.libpath, give your Namenode port number.please refer the below for the same.
1 2 3 4 5 6 7 8 9 |
<property> <name>oozie.service.WorkflowAppService.system.libpath</name> <value>hdfs://localhost:9000/user/${user.name}/share/lib</value> <description> System library path to use for workflow applications. This path is added to workflow application if their job properties sets the property 'oozie.use.system.libpath' to true. </description> </property> |
Now give the ownership permission to the oozie folder by using the below command
1 |
sudo chown hadoop's_user_name oozie_file_path(in our case it is $HOME/oozie) |
Creating Sharelib directory in HDFS
Note: Make sure that all your hadoop daemons are started properly.
Move into the bin folder of newly created oozie-4.1.0.
Now create a file in hdfs for storing the oozie contents with name sharelib using the below command:
1 |
./oozie-setup.sh sharelib create -fs hdfs://localhost:9000 |
The above command will create a folder with name sharelib in HDFS.
You will get a message as follows:
the destination path for sharelib is: hdfs://localhost:9000/user/kiran/share/lib/
Creating Oozie DB
Before creating a oozie DB make sure that you have installed Mysql-server in your system.
If you haven’t installed mysql, install it by using the command
Command to install mysql_server in Centos
1 |
sudo yum install mysql-server |
Command to install mysql_server in Ubuntu
1 |
sudo apt-get install mysql-server |
After the installation of MYSQL server, move into the newly created oozie-4.1.0’s bin folder and then type the below command
1 |
./ooziedb.sh create -sqlfile oozie.sql -run |
After running this command successfully, you will get the below output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m" Validate DB Connection DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE Create OOZIE_SYS table DONE Oozie DB has been created for Oozie version '4.1.0' The SQL commands have been written to: oozie.sql |
With this step, your oozie installation is completed.
Now export the newly created oozie’s bin path into your .bashrc file from your home folder by using the below command
gedit .bashrc
After the editing of bashrc file, save the file and close the file, now update the bashrc file by using the below command
source .bashrc
Now your oozie is successfully configured with your hadoop cluster. Now start oozie by using the command
1 |
oozied.sh start |
Now your oozie is successfully started, you can also check the same with the webUI.
Open your browser, and then type localhost:11000, 11000 is the default port for oozie.
All the Active and suspended jobs can be seen in the web UI.
We have successfully installed Oozie-4.1.0 on hadoop 2.x cluster.
Hope this blog helped you in installing oozie in your hadoop cluster, keep visiting our site for more updates on BigData and other technologies.
Great post, thanks for sharing the precise steps!