Friday, August 5, 2016

Run Pig Jobs with Oozie

           

                           Run Pig Jobs with Oozie          


This Blog assumes below Oozie and Pig package are already installed on running MapR cluster and Steps from blog 1 is already followed.

mapr-oozie-4.1.0.201606271017-1.noarch
mapr-oozie-internal-4.1.0.201606271017-1.noarch
mapr-pig-0.14.201608040131-1.noarch

http://abizeradenwala.blogspot.com/2015/07/installing-oozie-and-running-sample-job.html


Since Oozie is current bundled with Pig v0.12 we will need below steps for oozie pig action to work.

/opt/mapr/oozie/oozie-4.1.0/share1/lib/pig/pig-withouthadoop-0.12.1-mapr-1408-h2.jar
/opt/mapr/oozie/oozie-4.1.0/share1/lib/pig-2/pig-withouthadoop-0.12.1-mapr-1408-h2.jar

1) The Oozie share/lib directory has two sets of JAR files for Pig.  We will use the Pig JAR files from the share/lib/pig-2 directory with MapR distribution versions 4.0.0 and later.
To specify the JAR files for a given Pig job, add the following section to the workflow.xml file:


<name>oozie.action.sharelib.for.pig</name>
<value>pig-2</value>

2) Stop Oozie:
maprcli node services -name oozie -action stop -nodes <nodes>

3)  Remove all files located within the /opt/mapr/oozie/oozie<version>/share2/lib/pig*/ directory EXCEPT the oozie-sharelib-pig-<version>-mapr.jar file.

Now copy new Pig jars to share lib location,
cp <PIG_HOME>/pig-core-h2.jar <OOZIE_HOME>/share2/lib/pig-2/
cp <PIG_HOME>/lib/* <OOZIE_HOME>/share2/lib/pig-2/ 

4) Remove the zookeeper jars .


rm -rf <OOZIE_HOME>/share2/lib/pig-2/zookeeper*.jar

5) Now move all the old jars in latest share lib in MaprFS to temp location.

hadoop fs -mv /oozie/share/lib/lib_20160804181903/pig-2/* /abizer

And now copy latest jars into share lib in MaprFS

hadoop fs -put /opt/mapr/oozie/oozie-4.1.0//share2/lib/pig-2/*   /oozie/share/lib/lib_20160804181903/pig-2

6) Copy work-flow.xml to maprfs which is specified in job.properties file

hadoop fs -put workflow.xml /user/mapr/examples/apps/pig/workflow.xml

Example of my workflow.xml
[mapr@node3 pig-2]$ cat /opt/mapr/oozie/oozie-4.1.0/examples/apps/pig/workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
    <start to="pig-node"/>
    <action name="pig-node">
        <pig>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/output-data/pig"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
                <property>
                        <name>oozie.action.sharelib.for.pig</name>
                        <value>pig-2</value>
                 </property>
            </configuration>
            <script>id.pig</script>
            <param>INPUT=/user/${wf:user()}/input-data/text</param>
            <param>OUTPUT=/user/${wf:user()}/output-data/pig</param>
        </pig>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

7 ) Start Oozie:
maprcli node services -name oozie -action start -nodes <nodes>

8) As user MapR i am running sample workflow.

[mapr@node3 root]$ /opt/mapr/oozie/oozie-4.1.0/bin/oozie job -oozie="http://localhost:11000/oozie" -config /opt/mapr/oozie/oozie-4.1.0/examples/apps/pig/job.properties -run
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/oozie/oozie-4.1.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
job: 0000000-160805145748028-oozie-mapr-W

9) On checking the status pig wf was successfully executed by Oozie.

[mapr@node3 root]$ /opt/mapr/oozie/oozie-4.1.0/bin/oozie job -info 0000000-160805145748028-oozie-mapr-W -oozie="http://localhost:11000/oozie"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/oozie/oozie-4.1.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
Job ID : 0000000-160805145748028-oozie-mapr-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : pig-wf
App Path      : maprfs:/user/mapr/examples/apps/pig
Status        : SUCCEEDED
Run           : 0
User          : mapr
Group         : -
Created       : 2016-08-05 18:58 GMT
Started       : 2016-08-05 18:58 GMT
Last Modified : 2016-08-05 18:59 GMT
Ended         : 2016-08-05 18:59 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-160805145748028-oozie-mapr-W@:start:                                  OK        -                      OK         -        
------------------------------------------------------------------------------------------------------------------------------------
0000000-160805145748028-oozie-mapr-W@pig-node                                 OK        job_1470423487588_0001 SUCCEEDED  -        
------------------------------------------------------------------------------------------------------------------------------------
0000000-160805145748028-oozie-mapr-W@end                                      OK        -                      OK         -        
------------------------------------------------------------------------------------------------------------------------------------

[mapr@node3 root]$






No comments:

Post a Comment