Monday, September 21, 2015

Performance Issue / Hang issue

                                                Performance Issue / Hang issue

-First we can use vmstat to check system resource utilization and see which resource is scarce when hang is observed as below. If its due to Memory constrains or system swapping its easy to address by adding more memory to the server or putting cap on processes to utilize only dedicated amount of memory. Below blog gives detail on identifying system resource constrains step by step.

http://abizeradenwala.blogspot.com/2015/07/resource-utilization-to-quickly.html

- If we identify its certain process causing it then we can start with htop to identify if processes hogging most amount of resources have multiple threads running in them internally . Htop is usually not installed on the system by default and might need below package installed.

htop-1.0.1-2.el6.x86_64

Using htop, press t to get a nested tree of threads and specifically look at PID which seems to be hogging most amount of resources.

Alternatively to get list of all the threads and processes currently running on the system we would need to run below command to get all threads .  Running below ps as a cron job and poll the system every 5 mins and recreate the system hang can identify if there are huge number of threads which have been spunned up via java internally.

ps -eLf | grep 21917


UID        PID  PPID   LWP  C NLWP   STIME TTY           TIME     CMD
mapr     21887     1    21917  0   53           Sep11                   00:01:05   /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java



  -L  Show threads, possibly with LWP( light wt process) and NLWP ( Not light wt process ) columns

- If we do see some process continuously hogging CPU it would point us to some part of Code is not using CPU effectively or some threads are stuck in CPU and not making progress . We should run 
"kill -3 " every 1 sec for it to thread dump to standard out of the process and carefully review the thread dump with Developers to identify any issue in the code. This will usually show which thread is slow or not making any or significant progress during the time timeframe when issue was observed , this would help developers to put the fix in the code to avoid potential hang.

Friday, September 11, 2015

Iostat


                                                                     Iostat


Alot of time we need to find out disk IO utilization, measure if all disks are performing well and monitor system input/output device loading by observing the time the physical disks are active in relation to their average transfer rates.  I below example i ran dd on disk "sdb" and ran iostat in back ground, its seen disk is busy i.e %util is almost 100%.

dd bs=1M count=4096 if=/dev/zero of=/dev/sdb oflag=direct    ( Directly writing to disk )

Iostat was run with below options where m = display numbers in MB  -t = print time stamp
-x = Display extended statistics

 iostat -m -t -x 1

09/11/2015 02:28:39 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                   0.00    0.00      2.27      41.48      0.00   56.25
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00          0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00          0.00   44.00     0.00    22.00  1024.00     1.57   34.82  22.68  99.80
sdc               0.00     0.00          0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00          0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00          0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00



Line 1 : On First line it prints time
Line 2 and 3 : Print CPU stats/utilization , here we see CPU is still idle

Line 5 --- n : Prints various stats for each disk ( Stat printed mean as below ) , I will shortly explain which once are what and what to infer out of this numbers.


  • rrqm/s : The number of read requests merged per second that were queued to the hard disk
  • wrqm/s : The number of write requests merged per second that were queued to the hard disk
  • r/s : The number of read requests per second
  • w/s : The number of write requests per second
  • rsec/s : The number of sectors read from the hard disk per second
  • wsec/s : The number of sectors written to the hard disk per second
  • avgrq-sz : The average size (in sectors) of the requests that were issued to the device.
  • avgqu-sz : The average queue length of the requests that were issued to the device If one complains about I/O performance issues when avgqu-sz is lower, then it is application specific stuff, that can be resolved with more aggressive read-ahead, less fsyncs, etc. One interesting part – avqu-sz, await, svctm and %util are iterdependent ( await = avgqu-sz * svctm / (%util/100)
  • await : The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. If there is not alot of I/O being created but there are requests just pending it could be due to disk being slow due to H/W issue.
  • svctm : The average service time (in milliseconds) for I/O requests that were issued to the device
  • %util : Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%. Also this value is excluding any kind of cache here – if request can be served from cache, the chance is quite negligible it will show up in %util, unlike in other values. 


If below values from the iostat output is high it would mean the specific disk is under pressure: Clearly in above example sdb is being utilized a lot.
  1. The average service time (svctm)
  2. Percentage of CPU time during which I/O requests were issued (%util)
  3. If a hard disk reports consistently high reads/writes (r/s and w/s)
  4. await is continuously high  ( Very important ) . 
  5. avgqu-sz is continuously high ( very important )

Note :- " -n " option Displays the network filesystem (NFS) report


commonly accepted averages
Rotational Speed (rpm)IOPS
540050-80
720075-100
10k125-150
15k175-210


Sunday, September 6, 2015

Top Command Explanation


                                                         Top Command Explanation 
As a Linux system admin top command is a frequently used command to view resource utilization (Memory and CPU) by processes on server. This command helps us to find which process is utilizing what resources of system and nail down the process which is hogging all memory or churning CPU.
Although there are much better and user friendly tools then top like htop but i would consider this to be used as a first level command and later use more specific commands to further drill down to RC the issue . Below post i am writing on how to use and read results of top command.
Reading Linux Top Command Output:
When we execute top command on linux, it shows a lot of results, here i am trying to show you to how to read it row by row.

Result Row #1:

Row 1 results shows about server up time from last reboot, currently logged in users and cpu load on server. The same output you can find using linux uptime command.
top - 21:56:08 up 62 days,  6:38,  3 users,  load average: 0.08, 0.04, 0.00

Result Row #2:

Row 2 shows the number of process running on server and there state.
Tasks: 187 total,   1 running, 186 sleeping,   0 stopped,   0 zombie

Zombie process is a process that has completed execution but still has an entry in the process table. This entry is still needed to allow the parent process to read its child’s exit status. Zombies are basically the leftover bits of dead processes that haven’t been cleaned up properly. A program that creates zombie processes isn’t programmed properly – programs aren’t supposed to let zombie processes stick around.

Result Row #3:

Row three shows the cpu utilization status on server, you can find here how much cpu is free and how much is utilizing by system.
Cpu(s):  0.3%us,  0.5%sy,  0.0%ni, 98.8%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
0.3% us : %CPU used by User processes
0.5%sy : %CPU used by System Processes
0.0%ni : %CPU used by setting Nice value
98.8%id : %CPU in Idle state
0.3%wa : %CPU Waiting on I/O
0.0%hi / 0.0%si  :  %CPU used by Hardware/Software Interrupts 
0.0%st : Steal time is the time that a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.

Result Row #4:

Row 4 shows the memory utilization on server, you can find here how much memory is used, the same results you can find using free command.

            Total Memory    Used Memory      Free Memory    Buffered Memory

Mem:   8193720k total,  6317764k used,   1875956k free,   493816k buffers

Result Row #5:

Row 5 shows the swap memory utilization on server, you can find here how much swap is being used, the same results you can find using free command ( last line ). Usually if you system is swapping / using swap memory it indicated system is under memory pressure and will most likely cause system to run very slow

            Total Swap Mem   Used Swap      Free Swap        Cached Swap

Swap:  9215992k total,      3920k used,  9212072k free,  1742556k cached


Result Row #6 ( Running Processes ):

In this steps you will see all running process on servers and details about each process as below.
PID USER      PR  NI  VIRT   RES  SHR S %CPU %MEM     TIME+   COMMAND                                                                                                             4706 mapr        10 -10  2706m   2.4g  17m  S    2.0       30.7      641:03.31    mfs    
PID - Process ID of process           USER - User who's running the process
PR - Priority of process                 NI - Nice value of process
VIRT - Virtual Memory used by process    RES - Physical Memory used by process ( Actual Memory )
SHR - Shared Memory used by process     S -  Current status of the process
%CPU / %MEM - % CPU/MEM used by the process
TIME+   -  Total time process is been running for.         COMMAND - Name of process

By default TOP sorts the output by %CPU usage (k), If you would want to sort output on basis of any fields, you can use SHIFT+F and select appropriate Alphabet to sort field via field as below and press Enter.
 a: PID        = Process Id
  b: PPID       = Parent Process Pid
  c: RUSER      = Real user name
  d: UID        = User Id
  e: USER       = User Name
  f: GROUP      = Group Name
  g: TTY        = Controlling Tty
  h: PR         = Priority
  i: NI         = Nice value
  j: P          = Last used cpu (SMP)
* K: %CPU       = CPU usage
  l: TIME       = CPU Time
  m: TIME+      = CPU Time, hundredths
  n: %MEM       = Memory usage (RES)
  o: VIRT       = Virtual Image (kb)
  p: SWAP       = Swapped size (kb)
  q: RES        = Resident size (kb)
  r: CODE       = Code size (kb)
  s: DATA       = Data+Stack size (kb)
  t: SHR        = Shared Mem size (kb)
  u: nFLT       = Page Fault count
  v: nDRT       = Dirty Pages count
  w: S          = Process Status
  x: COMMAND    = Command name/line
  y: WCHAN      = Sleeping in Function
 z: Flags      = Task Flags <sched.h>

Note :-  When top is running and you press 1 it will show you per CPU (different for each CPU) resource utilization .

Monday, August 10, 2015

Running Spark on YARN


                                           

              Running Spark on YARN



There are two deploy modes that can be used to launch Spark applications on YARN. 
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. 
In yarn-clientmode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Unlike in Spark standalone mode, in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration.


Spark is distributed as two separate packages:
  • mapr-spark
  • mapr-spark-historyserver for Spark History Server (optional)


Note :- Its assumed you have all the core mapr packages along with Yarn packages installed and the cluster has warden stopped to now install and configure spark on the nodes.

1)  On one node you would install spark and spark history server package while rest nodes you can just install spark package

yum install mapr-spark mapr-spark-historyserver -y
yum install mapr-spark -y


2)  Run the configure.sh command:

[root@yarn1 ~]#/opt/mapr/server/configure.sh -R
Configuring Hadoop-2.5.1 at /opt/mapr/hadoop/hadoop-2.5.1
Done configuring Hadoop
Node setup configuration:  cldb fileserver historyserver nodemanager spark-historyserver webserver zookeeper
Log can be found at:  /opt/mapr/logs/configure.log

Thats all, Spark is preconfigured for YARN and does not require any additional configuration to run. To test the installation, run the following command:

[root@yarn1 ~]# su - mapr

[mapr@yarn1 ~]$ MASTER=yarn-cluster /opt/mapr/spark/spark-1.2.1/bin/run-example org.apache.spark.examples.SparkPi
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/spark/spark-1.2.1/lib/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/08/09 21:18:17 INFO client.RMProxy: Connecting to ResourceManager at /10.10.70.118:8032
15/08/09 21:18:18 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/08/09 21:18:18 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/08/09 21:18:18 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/08/09 21:18:18 INFO yarn.Client: Setting up container launch context for our AM
15/08/09 21:18:18 INFO yarn.Client: Preparing resources for our AM container
15/08/09 21:18:18 INFO yarn.Client: Uploading resource file:///opt/mapr/spark/spark-1.2.1/lib/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar -> maprfs:/user/mapr/.sparkStaging/application_1439183740763_0002/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar
15/08/09 21:18:23 INFO yarn.Client: Uploading resource file:/opt/mapr/spark/spark-1.2.1/lib/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar -> maprfs:/user/mapr/.sparkStaging/application_1439183740763_0002/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar
15/08/09 21:18:29 INFO yarn.Client: Setting up the launch environment for our AM container
15/08/09 21:18:29 INFO spark.SecurityManager: Changing view acls to: mapr
15/08/09 21:18:29 INFO spark.SecurityManager: Changing modify acls to: mapr
15/08/09 21:18:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mapr); users with modify permissions: Set(mapr)
15/08/09 21:18:29 INFO yarn.Client: Submitting application 2 to ResourceManager
15/08/09 21:18:29 INFO security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
15/08/09 21:18:29 INFO impl.YarnClientImpl: Submitted application application_1439183740763_0002
15/08/09 21:18:30 INFO yarn.Client: Application report for application_1439183740763_0002 (state: ACCEPTED)
15/08/09 21:18:30 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.mapr
start time: 1439183909584
final status: UNDEFINED
tracking URL: http://yarn2:8088/proxy/application_1439183740763_0002/
user: mapr
15/08/09 21:18:31 INFO yarn.Client: Application report for application_1439183740763_0002 (state: ACCEPTED)
15/08/09 21:18:32 INFO yarn.Client: Application report for application_1439183740763_0002 (state: ACCEPTED)
15/08/09 21:18:40 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:40 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: yarn1
ApplicationMaster RPC port: 0
queue: root.mapr
start time: 1439183909584
final status: UNDEFINED
tracking URL: http://yarn2:8088/proxy/application_1439183740763_0002/
user: mapr
15/08/09 21:18:41 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:42 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:43 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:44 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:45 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:55 INFO yarn.Client: Application report for application_1439183740763_0002 (state: RUNNING)
15/08/09 21:18:56 INFO yarn.Client: Application report for application_1439183740763_0002 (state: FINISHED)
15/08/09 21:18:56 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: yarn1
ApplicationMaster RPC port: 0
queue: root.mapr
start time: 1439183909584
final status: SUCCEEDED
tracking URL: http://yarn2:8088/proxy/application_1439183740763_0002/history/application_1439183740763_0002
user: mapr

Sunday, August 9, 2015

Installing Spark in Standalone mode



                                      Installing Spark in Standalone mode


This post has instructions for installing and running Spark 1.2.1 in standalone mode on 4.1 MapR cluster.
Spark runs on the cluster directly and don't need MR framework to run jobs (It has its own execution engine ). Spark is distributed as three separate packages:
  • mapr-spark for Spark worker nodes .
  • mapr-spark-master for Spark master nodes .
  • mapr-spark-historyserver for Spark History Server ( Used to view jobs run earlier )


Note :- This post assumes you have mapr core packages installed on the nodes , nodes configured correctly and warden just stopped to install spark on the cluster.

1)  On one of the node install Spark master,  Spark worker and Spark History Server packages .

yum install mapr-spark mapr-spark-master mapr-spark-historyserver -y

2)  While rest nodes just Spark worker packages would be enough.

yum install mapr-spark -y

3)  We can verify packages are installed correctly on all nodes via clush at once as below.

[root@node1 ~]# clush -ab
Enter 'quit' to leave this interactive mode
Working with nodes: node[1-4]
clush> rpm -qa| grep spark
---------------
node[1,3-4] (3)
---------------
mapr-spark-1.2.1.201506091827-1.noarch
---------------
node2
---------------
mapr-spark-1.2.1.201506091827-1.noarch
mapr-spark-master-1.2.1.201506091827-1.noarch
mapr-spark-historyserver-1.2.1.201506091827-1.noarch
clush> quit


4)  Now , run configure.sh script on Master node for Nodes to realize they have a new Spark package installed and get listed in roles .

[root@node2 ~]# /opt/mapr/server/configure.sh -R
Configuring Hadoop-2.5.1 at /opt/mapr/hadoop/hadoop-2.5.1
Done configuring Hadoop
Node setup configuration:  cldb fileserver hivemetastore hiveserver2 spark-historyserver spark-master tasktracker zookeeper
Log can be found at:  /opt/mapr/logs/configure.log

5)  Now we can make a slaves file and add all the hostname in the cluster where spark slaves/worker are installed.

[root@node2 ~]# vi /opt/mapr/spark/spark-1.2.1/conf/slaves
localhost
node1
node3
node4

6)  Now make sure trust exist from Spark Master node to all worker nodes for user MapR. Below is blog to create trust quickly.(  su - mapr )

http://abizeradenwala.blogspot.com/2015/07/creating-ssh-trust-quickly.html

Note :-  you can verify trust by sshing into other hosts without password for user mapr.

7)  Now starting warden on Master node should bring up Spark Master and History server as well.

[root@node2 ~]# service mapr-warden start
Starting WARDEN, logging to /opt/mapr/logs/warden.log.
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files

Note:- Warden service on rest nodes can be started as well and make sure the cluster is fully up.

8) Spark worker services on all slave machines can be started via start-slaves.sh script.

[mapr@node2 ~]$ /opt/mapr/spark/spark-1.2.1/sbin/start-slaves.sh
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/mapr/spark/spark-1.2.1/sbin/../logs/spark-mapr-org.apache.spark.deploy.worker.Worker-1-node2.mycluster.com.out
node1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/mapr/spark/spark-1.2.1/logs/spark-mapr-org.apache.spark.deploy.worker.Worker-1-node1.mycluster.com.out
node3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/mapr/spark/spark-1.2.1/logs/spark-mapr-org.apache.spark.deploy.worker.Worker-1-node3.mycluster.com.out
node4: starting org.apache.spark.deploy.worker.Worker, logging to /opt/mapr/spark/spark-1.2.1/logs/spark-mapr-org.apache.spark.deploy.worker.Worker-1-node4.mycluster.com.out


9)  Once Spark master and all Spark worker nodes are up we can run a sample Pi job to make sure spark cluster is functional.

[mapr@node2 logs]$ MASTER=spark://node2:7077  /opt/mapr/spark/spark-1.2.1/bin/run-example org.apache.spark.examples.SparkPi
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/spark/spark-1.2.1/lib/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/08/07 19:42:25 INFO spark.SparkContext: Spark configuration:
spark.app.name=Spark Pi
spark.eventLog.dir=maprfs:///apps/spark
spark.eventLog.enabled=true
spark.executor.extraClassPath=
spark.executor.memory=2g
spark.jars=file:/opt/mapr/spark/spark-1.2.1/lib/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar
spark.logConf=true
spark.master=spark://node2:7077
spark.yarn.historyServer.address=http://node2:18080
15/08/07 19:42:25 INFO spark.SecurityManager: Changing view acls to: mapr
15/08/07 19:42:25 INFO spark.SecurityManager: Changing modify acls to: mapr
15/08/07 19:42:25 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mapr); users with modify permissions: Set(mapr)
15/08/07 19:42:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/08/07 19:42:26 INFO Remoting: Starting remoting
15/08/07 19:42:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@node2:33543]
15/08/07 19:42:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 33543.
15/08/07 19:42:26 INFO spark.SparkEnv: Registering MapOutputTracker
15/08/07 19:42:26 INFO spark.SparkEnv: Registering BlockManagerMaster
15/08/07 19:42:26 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-54a795ed-25cf-44b1-80d5-e90e45e86e60/spark-eb8fdea4-9623-4a68-89a7-1b3e0314ccaa
15/08/07 19:42:26 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
15/08/07 19:42:27 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-ea8c7aa1-0ddd-47ff-9e8b-33f9d7d84ac6/spark-c735f7f9-a4f5-4a2d-af8f-6d4c1b770d59
15/08/07 19:42:27 INFO spark.HttpServer: Starting HTTP Server
15/08/07 19:42:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/08/07 19:42:27 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:37798
15/08/07 19:42:27 INFO util.Utils: Successfully started service 'HTTP file server' on port 37798.
15/08/07 19:42:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/08/07 19:42:27 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/08/07 19:42:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/08/07 19:42:27 INFO ui.SparkUI: Started SparkUI at http://node2:4040
15/08/07 19:42:28 INFO spark.SparkContext: Added JAR file:/opt/mapr/spark/spark-1.2.1/lib/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar at http://10.10.70.107:37798/jars/spark-examples-1.2.1-hadoop2.5.1-mapr-1501.jar with timestamp 1439005348176
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Connecting to master spark://node2:7077...
15/08/07 19:42:28 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150807194228-0000
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor added: app-20150807194228-0000/0 on worker-20150807192803-node1-50432 (node1:50432) with 2 cores
15/08/07 19:42:28 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150807194228-0000/0 on hostPort node1:50432 with 2 cores, 2.0 GB RAM
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor added: app-20150807194228-0000/1 on worker-20150807192805-node2-59112 (node2:59112) with 2 cores
15/08/07 19:42:28 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150807194228-0000/1 on hostPort node2:59112 with 2 cores, 2.0 GB RAM
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor added: app-20150807194228-0000/2 on worker-20150807192808-node4-57599 (node4:57599) with 2 cores
15/08/07 19:42:28 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150807194228-0000/2 on hostPort node4:57599 with 2 cores, 2.0 GB RAM
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor added: app-20150807194228-0000/3 on worker-20150807192729-node2-57531 (node2:57531) with 2 cores
15/08/07 19:42:28 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150807194228-0000/3 on hostPort node2:57531 with 2 cores, 2.0 GB RAM
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor added: app-20150807194228-0000/4 on worker-20150807192805-node3-38257 (node3:38257) with 2 cores
15/08/07 19:42:28 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150807194228-0000/4 on hostPort node3:38257 with 2 cores, 2.0 GB RAM
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/0 is now RUNNING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/1 is now RUNNING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/2 is now RUNNING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/0 is now LOADING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/3 is now RUNNING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/2 is now LOADING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/4 is now RUNNING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/4 is now LOADING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/1 is now LOADING
15/08/07 19:42:28 INFO client.AppClient$ClientActor: Executor updated: app-20150807194228-0000/3 is now LOADING
15/08/07 19:42:28 INFO netty.NettyBlockTransferService: Server created on 37575
15/08/07 19:42:28 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/08/07 19:42:28 INFO storage.BlockManagerMasterActor: Registering block manager node2:37575 with 265.4 MB RAM, BlockManagerId(<driver>, node2, 37575)
15/08/07 19:42:28 INFO storage.BlockManagerMaster: Registered BlockManager
15/08/07 19:42:29 INFO scheduler.EventLoggingListener: Logging events to maprfs:///apps/spark/app-20150807194228-0000
15/08/07 19:42:30 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/08/07 19:42:30 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:35
15/08/07 19:42:30 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)
15/08/07 19:42:30 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
15/08/07 19:42:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/08/07 19:42:30 INFO scheduler.DAGScheduler: Missing parents: List()
15/08/07 19:42:30 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/08/07 19:42:31 INFO storage.MemoryStore: ensureFreeSpace(1728) called with curMem=0, maxMem=278302556
15/08/07 19:42:31 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 265.4 MB)
15/08/07 19:42:31 INFO storage.MemoryStore: ensureFreeSpace(1126) called with curMem=1728, maxMem=278302556
15/08/07 19:42:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1126.0 B, free 265.4 MB)
15/08/07 19:42:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on node2:37575 (size: 1126.0 B, free: 265.4 MB)
15/08/07 19:42:31 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/08/07 19:42:31 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/08/07 19:42:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)
15/08/07 19:42:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/08/07 19:42:33 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node4:45758/user/Executor#-2015134222] with ID 2
15/08/07 19:42:33 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, node4, PROCESS_LOCAL, 1347 bytes)
15/08/07 19:42:33 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, node4, PROCESS_LOCAL, 1347 bytes)
15/08/07 19:42:33 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node1:33791/user/Executor#-776059221] with ID 0
15/08/07 19:42:33 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node3:35539/user/Executor#790771493] with ID 4
15/08/07 19:42:33 INFO storage.BlockManagerMasterActor: Registering block manager node4:54576 with 1060.3 MB RAM, BlockManagerId(2, node4, 54576)
15/08/07 19:42:33 INFO storage.BlockManagerMasterActor: Registering block manager node1:33323 with 1060.3 MB RAM, BlockManagerId(0, node1, 33323)
15/08/07 19:42:35 INFO storage.BlockManagerMasterActor: Registering block manager node3:56516 with 1060.3 MB RAM, BlockManagerId(4, node3, 56516)
15/08/07 19:42:37 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node2:51330/user/Executor#162860550] with ID 3
15/08/07 19:42:37 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node2:50174/user/Executor#1110281472] with ID 1
15/08/07 19:42:38 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on node4:54576 (size: 1126.0 B, free: 1060.3 MB)
15/08/07 19:42:39 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 5832 ms on node4 (1/2)
15/08/07 19:42:39 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 5815 ms on node4 (2/2)
15/08/07 19:42:39 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/07 19:42:39 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 7.855 s
15/08/07 19:42:39 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 8.736218 s
Pi is roughly 3.13904
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
15/08/07 19:42:39 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
15/08/07 19:42:39 INFO ui.SparkUI: Stopped Spark web UI at http://node2:4040
15/08/07 19:42:39 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/08/07 19:42:39 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
15/08/07 19:42:39 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
15/08/07 19:42:40 INFO storage.BlockManagerMasterActor: Registering block manager node2:49489 with 1060.3 MB RAM, BlockManagerId(1, node2, 49489)
15/08/07 19:42:40 INFO storage.BlockManagerMasterActor: Registering block manager node2:59803 with 1060.3 MB RAM, BlockManagerId(3, node2, 59803)
15/08/07 19:42:40 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/08/07 19:42:40 INFO storage.MemoryStore: MemoryStore cleared
15/08/07 19:42:40 INFO storage.BlockManager: BlockManager stopped
15/08/07 19:42:40 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/08/07 19:42:40 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/08/07 19:42:40 INFO spark.SparkContext: Successfully stopped SparkContext
15/08/07 19:42:40 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/08/07 19:42:40 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.


As seen above Pi job is successful and does give Pi value roughly as 3.13904 which confirms our installation was successfull and Spark in Standalone mode is configured successfully.

Friday, July 24, 2015

Installing Hive and Connecting to Hive services


                                  Installing Hive and Connecting to Hive services

Apache Hive™ is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Note : This post assumes you have MapR core packages installed and setup as described in my previous post "Install MapR Cluster"

Below post describes steps to follow to install hive , hive metastore and HS2 (version 12) on the same node.

1) List all version of hive packages available in yum repo.
yum search --showduplicates hive

2) Install hive , hive metastore and HS2  packages.
yum install mapr-hive-0.12.201502021326-1 mapr-hiveserver2-0.12.201502021326-1 mapr-hivemetastore-0.12.201502021326-1  -y

3) Run configure.sh for all hive packages to be listed under roles directory.
/opt/mapr/server/configure.sh -R
ls /opt/mapr/roles/

4) The metadata for Hive tables and partitions are stored in the Hive Metastore . By default, the Hive Metastore stores all Hive metadata in an embedded Apache Derby database in MapR-FS. Derby only allows one connection at a time; since we want multiple concurrent Hive sessions, we will use MySQL for the Hive Metastore. 

Installing MYSQL

i) Install mysql server package from the configured yum repo.
yum install mysql-server

ii)  Start mysql deamon .
service mysqld start

iii) Set password  for root user as "password".
/usr/bin/mysqladmin -u root password password

5) Modify hive-site.xml and add below config to the xml file.

vi /opt/mapr/hive/hive-0.12/conf/hive-site.xml
_________________________________________________________________
<configuration> 
 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>
 
 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
 </property>
 
 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
 </property>
 
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>password</value>
    <description>password to use against metastore database</description>
 </property>
 
 <property>
    <name>hive.metastore.uris</name>
    <value>thrift://localhost:9083</value>
 </property>
_________________________________________________________________

6) Add "export METASTORE_PORT=9083" parameter to hive-env.sh

cp /opt/mapr/hive/hive-0.12/conf/hive-env.sh.template /opt/mapr/hive/hive-0.12/conf/hive-env.sh

vi hive-env.sh   ( Add below parameter )

export METASTORE_PORT=9083

7)  Now we can start the cluster.

service mapr-zookeeper start 
service mapr-zookeeper qstatus      ( verify status of zookeeper )
service mapr-warden start               

maprcli node services -name hivemeta -action restart -nodes `hostname`    ( is needed restart the hivemeta and check hive metastore logs to verify it started correctly )

view /tmp/mapr/hive.log                                   ( logs related to metastore are logged here )

maprcli node services -name hs2 -action restart -nodes `hostname -f`    ( is needed restart the HS2 and check hs2 logs to verify it started correctly in hive.log as well)


Once all the service are up there are 2 ways  to connect using hive cli or HS2 ( beeline ) .

i) HS1 / Hive cli : HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer cannot handle concurrent requests from more than one client . Below commands lists the commands to be run while working interactively via hive cli .

[root@311-HS2-1 ~]# hive
Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar!/hive-log4j.properties
hive> show tables;
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-0.12/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
OK
Time taken: 0.711 seconds
hive> 


ii) HS2 : HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. Below list of commands depict way to connect to HS2 via beeline.


[root@311-HS2-1 ~]# hive --service beeline
Beeline version 0.12-mapr-1501 by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:10000/default
scan complete in 3ms
Connecting to jdbc:hive2://127.0.0.1:10000/default
Enter username for jdbc:hive2://127.0.0.1:10000/default: mapr
Enter password for jdbc:hive2://127.0.0.1:10000/default: ****
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-0.12/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Connected to: Hive (version 0.12-mapr-1501)
Driver: Hive (version 0.12-mapr-1501)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://127.0.0.1:10000/default> show tables;
[HiveQueryResultSet/next] 0
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (0.638 seconds)
0: jdbc:hive2://127.0.0.1:10000/default>





Monday, July 20, 2015

Installing Oozie and running Sample Job

                          Oozie Installation and Sample job walkthrough.


Apache Oozie™ is a workflow scheduler system to manage Apache Hadoop jobs.  Using Oozie, you can set up workflows that execute MapReduce jobs and coordinators that manage workflows.

This post assumes you have MapR 4.1 cluster installed and configured on the system with warden and ZK services stopped to install and configure Oozie .

1) Now setup correct Ecosystem Repo. Edit /etc/yum.repos.d/maprtech.repo and add ecosystem link as seen below.


[maprtech]
name=MapR Technologies
baseurl=http://package.mapr.com/releases/v4.1.0/redhat/
enabled=1
gpgcheck=0
protect=1

[maprecosystem]
name=MapR Technologies
baseurl=http://package.mapr.com/releases/ecosystem-4.x/redhat
enabled=1
gpgcheck=0
protect=1

2) Install Oozie 


 yum install mapr-oozie

3) For non-secure clusters, add the following two properties to the core-site.xml

( For MR1 cluster /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml )


<configuration>
<property>
  <name>hadoop.proxyuser.mapr.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.mapr.groups</name>
  <value>*</value>
</property>
</configuration>


4) Start the cluster

service mapr-zookeeper start      

Now,  service mapr-zookeeper qstatus   ( to check status of ZK )

service mapr-warden start            

5) Export the Oozie URL to your environment with the following command:
export OOZIE_URL='http://<Oozie_node>:11000/oozie'
6) Get into below path and check Oozie’s status with the following command .    ( You should see below relevant messages )

cd /opt/mapr/oozie/oozie-4.1.0/bin/


./oozie admin -status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/oozie/oozie-4.1.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
System mode: NORMAL

Above indicates normal operation

Now Enabling the Oozie Web UI

The Oozie web UI can display your job status, logs, and other related information. The oozie.war file must include the extjs library to enable the web UI. After installing Oozie, perform the following steps to add the ExtJS library to your oozie.war file


1)  Download the extjs library under /root .

cd /root/
wget dev.sencha.com/deploy/ext-2.2.zip

2) If Oozie is running, shut it down:
maprcli node services -name oozie -action stop -nodes <space delimited list of nodes>
3)  Run the oozie-setup.sh script and specify the path to the extjs file.

cd /opt/mapr/oozie/oozie-4.1.0/
./bin/oozie-setup.sh prepare-war -extjs ~/ext-2.2.zip

4) Start Oozie.

maprcli node services -name oozie -action start -nodes <space delimited list of nodes>


Point your browser to http://<oozie_node>:11000/oozie (To review the web UI for Oozie)

Setup and Run Oozie Example .

After verifying the status of Oozie, set up and we can try the example to get familiar with Oozie.

1)  Extract the oozie examples archive oozie-examples.tar.gz

cd /opt/mapr/oozie/oozie-4.1.0/
gunzip oozie-examples.tar.gz ; tar xvf ./oozie-examples.tar -C /opt/mapr/oozie/oozie-4.1.0/

2) Copy the examples to MapR-FS. 

 hadoop fs -put examples /user/root/examples

3) Copy the input data to MapR-FS.

hadoop fs -put examples/input-data maprfs:///user/root/input-data

4) Change permissions on the examples to make them accessible to all users.

hadoop fs -chmod -R 777 /user/root/examples
hadoop fs -chmod -R 777 /user/root/input-data

5) Run an example with the oozie job command as below . 
First copy the Map-reduce example folder to MapR-FS and then run the Map-Reduce job as below.

hadoop fs -put /opt/mapr/oozie/oozie-4.1.0/examples/apps/map-reduce /user/root/examples/apps/

/opt/mapr/oozie/oozie-4.1.0/bin/oozie job -config /opt/mapr/oozie/oozie-4.1.0/examples/apps/map-reduce/job.properties -run


You can verify the JOB was successful in Oozie UI or via below command

/opt/mapr/oozie/oozie-4.1.0/bin/oozie job -info <job id>