Big Data / Systems / Cloud / AI : January 2018

Wednesday, January 31, 2018

Monitoring TEZ jobs on 5.x MapR Version

Monitoring TEZ jobs on 5.x MapR Version

This Blog describes installation of the Hive-on-Tez along with having a way to monitor Tea jobs ( manual steps.)

STEP I : Install /Configure Tez 0.8 on hive 2.1

Note : This blog expects you have a 5.2.2 unsecured cluster already setup with Hive 2.1 and Java 8

1) Create the /apps/tez directory on MapR-FS.

To create, run the following commands:

hadoop fs -mkdir /apps
hadoop fs -mkdir /apps/tez

2) Setup repo to download tez packages followed by installation of tez package .

i) Repo location.

[root@node107rhel72 ~]# cat /etc/yum.repos.d/mapr_eco.repo 

[MapR_Ecosystem]

name=MapR Ecosystem Components

baseurl=http://package.mapr.com/releases/MEP/MEP-3.0.2/redhat

gpgcheck=1

enabled=1

protected=1

ii) Tez package install.

[root@node107rhel72 ~]# yum install mapr-tez

3) Upload the Tez libraries to the tez directory on MapR-FS.

To upload, run the following commands:

hadoop fs -put /opt/mapr/tez/tez-0.8 /apps/tez
hadoop fs -chmod -R 755 /apps/tez

4) Verify the upload.

hadoop fs -ls /apps/tez/tez-0.8

5) Set the Tez environment variables. To set, open the /opt/mapr/hive/hive-2.1/conf/hive-env.sh file, add the following lines, and save the file:

```
export TEZ_CONF_DIR=/opt/mapr/tez/tez-0.8/conf
export TEZ_JARS=/opt/mapr/tez/tez-0.8/*:/opt/mapr/tez/tez-0.8/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
```
Note: Repeat this step on each node where you want Hive on Tez to be configured ( Usually edge node but since i am testing on 1 node cluster blog will only do this step once on one node).

6) Configure Hive for Tez engine. To configure, open the /opt/mapr/hive/hive-2.1/conf/hive-site.xml file, add the following lines, and save the file.

<property>
  <name>hive.execution.engine</name>
  <value>tez</value>
</property>

Note: Repeat this step on each node where you want Hive on Tez to be configured. ( Usually edge node but since i am testing on 1 node cluster blog will only do this step once on one node).

Step II : Monitoring for Tez jobs ( Manual Install )

This topic describes how to configure the timeline server to use the Hive-on-Tez user interface.

1) Install timeline server. ( RPM will be provided by MapR support )

rpm -ivh mapr-timelineserver-2.7.0.20171206190528.GA-1.x86_64.rpm

Note: Install the timeline server on a single node. The Hive-on-Tez user interface does not support High Availability (HA).

2) Install below maps-patch or later ( Get this from MapR support )

rpm -ivh mapr-patch-5.2.2.44680.GA-20180118212034.x86_64.rpm

3) Add the following entry to the /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-env.sh file (on each node):

export YARN_TIMELINESERVER_OPTS="${YARN_TIMELINESERVER_OPTS} ${MAPR_LOGIN_OPTS}"

4) Add the following entry to the /opt/mapr/hadoop/hadoop-2.7.0/bin/yarn file after line "elif [ "$COMMAND" = "timelineserver" ] ; then" (on each node):

CLASSPATH=${CLASSPATH}:$MAPR_HOME/lib/JPam-1.1.jar

5) Edit the /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml file (on each node):

<property>
<description>Indicate to clients whether Timeline service is enabled or not.
If enabled, the TimelineClient library used by end-users will post entities
and events to the Timeline server.</description>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value> <hostname> </value>
</property>
<property>
<description>The setting that controls whether yarn system metrics is
published on the timeline server or not by RM.</description>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>

6) Configuring Tomcat server : This topic describes how to configure and manage the Tomcat server used by the Hive-on-Tez user interface.

i) Extract the Tomcat server

cd $TEZ_HOME/tomcat/

sudo tar -zxvf tomcat.tar.gz -C $TEZ_HOME/tomcat

ii) Change the permissions for the tomcat directory to the user who will be running the Tomcat server:

sudo chown -R <$USER>:<$USER_GROUP> $TEZ_HOME/tomcat

iii) Configuring the Timeline Server Base URL and Resource Manager WEB URL

Replace TIME_LINE_BASE_URL with the real URL i.e 'http://10.10.70.107:8188'

Replace RM_WEB_URL with the real URL i.e 'http://10.10.70.107:8088'

[root@node107rhel72 ~]# grep -i url /opt/mapr/tez/tez-0.8/tomcat/apache-tomcat-9.0.1/webapps/tez-ui/scripts/configs.js

timelineBaseUrl: 'http://10.10.70.107:8188',

RMWebUrl: 'http://10.10.70.107:8088',

[root@node107rhel72 ~]#

Note: The timelineBaseUrl maps to the YARN Timeline Server, and the RMWebUrl maps to the YARN Resource Manager.

iv) Now restart Tomcat server :

To stop the Tomcat server, run this script:

$TEZ_HOME/tomcat/apache-tomcat-<version>/bin/shutdown.sh

To start the Tomcat server, run this script:

$TEZ_HOME/tomcat/apache-tomcat-<version>/bin/startup.sh

7) Integrating the Hive-on-Tez User Interface with Tez

Perform these actions on each of the nodes where you have Hive-on-Tez configured.

i) Add the following entry to the /opt/mapr/tez/tez-<version>/conf/tez-site.xml file, replacing <hostname>:<port> with the real host name. Use 9383 for the port. 9383 is the default Tomcat port for the Hive-on-Tez user interface.

<property>
<description>Enable Tez to use the Timeline Server for History Logging</description>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>

<property>
<description>URL for where the Tez UI is hosted</description>
<name>tez.tez-ui.history-url.base</name>
<value>http://<hostname>:<port>/tez-ui/</value>
</property>

8) Ideally when doing the configuration warden should be down . Incase you have cluster up and just adding Yarn Timeline service on one node below services need to be restarted .

Restart the resource manager:

maprcli node services -name resourcemanager -action restart -nodes <hostname>

Restart the timeline server :

maprcli node services -name timelineserver -action start -nodes <nodename>

Validation : Test job .

hive> create table testtez (a1 int);

OK

Time taken: 1.001 seconds

hive> insert into testtez values (1);

Query ID = mapr_20180127013145_0aa28027-ceeb-4b85-9118-ed82e431bdb4

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1517043976772_0001)

----------------------------------------------------------------------------------------------

        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  

----------------------------------------------------------------------------------------------

Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  

----------------------------------------------------------------------------------------------

VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 8.98 s     

----------------------------------------------------------------------------------------------

Loading data to table default.testtez

OK

Time taken: 13.09 seconds

hive> 

Application Timeline server :

http://10.10.70.107:8188/applicationhistory

Tez UI:

http://10.10.70.107:9383/tez-ui/

Wednesday, January 3, 2018

Hourly Logrotate

In earlier post (System Stats) it was recommended to run basic linux tools at regular interval to collect necessary stats to go back and review incase there were any incident and system level details need to be reviewed .
Issue with regularly collecting this info is obviously we can run out of local disk space. This can be handled if we can use log rotate accurately and only retain details on local FS for lets say few hrs while move older logs to MFS or some other distributed storage. This Blog shows a way on how to utilize Logrotate to rotate all /opt/mapr/logs/*.out logs collected by system stats and move them to respective files every hr .

1) Create "systemstat" file under "/etc/logrotate.d" location with below configuration.

[root@node107rhel72 logrotate.d]# cat systemstat 

/opt/mapr/logs/*.out {

su  mapr mapr

create 777 mapr mapr

size 10k

daily

rotate 10

compress

delaycompress

copytruncate

}

Note :-  Make sure all the system stat logs are owned by mapr:mapr (/opt/mapr/logs/*.out)

2) Now since Log rotate can only be run depending on file size or daily , we can use cron.hourly to run the logrotate every hour.

i) Copy logrotate executable under cron.hourly.

cp /etc/cron.daily/logrotate /etc/cron.hourly/

ii) Make sure the file has execute permission for mapr user.

[root@node107rhel72 cron.hourly]# ls -l logrotate 

-rwxr-xr-x. 1 root root 180 Jan  3 19:25 logrotate

iii) Finally modify log rotate file to run "/etc/logrotate.d/systemstat" hourly .

[root@node107rhel72 cron.hourly]# cat logrotate 

#!/bin/sh

/usr/sbin/logrotate -f /etc/logrotate.d/systemstat

EXITVALUE=$?

if [ $EXITVALUE != 0 ]; then

    /usr/bin/logger -t logrotate "ALERT exited abnormally with [$EXITVALUE]"

fi

exit 0

[root@node107rhel72 cron.hourly]# 

Note :- Incase if you see below error in messages file then it could be due to SELinux and might have to be disabled.

logrotate: ALERT exited abnormally with [1]

Verify :

Every hr when the cron runs below logs are logged under "/var/log/cron" which confirms cron is running hourly as expected .

Jan  4 01:01:26 node107rhel72 CROND[8702]: (root) CMD (run-parts /etc/cron.hourly)

Jan  4 01:01:26 node107rhel72 run-parts(/etc/cron.hourly)[8702]: starting 0anacron

Jan  4 01:01:26 node107rhel72 anacron[8711]: Anacron started on 2018-01-04

Jan  4 01:01:26 node107rhel72 anacron[8711]: Normal exit (0 jobs run)

Jan  4 01:01:26 node107rhel72 run-parts(/etc/cron.hourly)[8713]: finished 0anacron

Jan  4 01:01:26 node107rhel72 run-parts(/etc/cron.hourly)[8702]: starting logrotate

Jan  4 01:01:26 node107rhel72 run-parts(/etc/cron.hourly)[8727]: finished logrotate

Jan  4 02:01:01 node107rhel72 CROND[15773]: (root) CMD (run-parts /etc/cron.hourly)

Jan  4 02:01:01 node107rhel72 run-parts(/etc/cron.hourly)[15773]: starting 0anacron

Jan  4 02:01:01 node107rhel72 anacron[15782]: Anacron started on 2018-01-04

Jan  4 02:01:01 node107rhel72 anacron[15782]: Normal exit (0 jobs run)

Jan  4 02:01:01 node107rhel72 run-parts(/etc/cron.hourly)[15784]: finished 0anacron

Jan  4 02:01:01 node107rhel72 run-parts(/etc/cron.hourly)[15773]: starting logrotate

Jan  4 02:01:01 node107rhel72 run-parts(/etc/cron.hourly)[15817]: finished logrotate

Now i am changing date couple of times to make sure the logs are rotated and zipped.

[root@node107rhel72 ~]# date -s "4 Jan 2018 00:00:50"

Thu Jan  4 00:00:50 PST 2018

[root@node107rhel72 ~]# date -s "4 Jan 2018 01:00:50"

Thu Jan  4 01:00:50 PST 2018

[root@node107rhel72 ~]# date -s "4 Jan 2018 02:00:50"

Thu Jan  4 02:00:50 PST 2018

[root@node107rhel72 ~]# ls -ltr /opt/mapr/logs/| grep vmstat.node107rhel72

-rw-r--r--. 1 mapr mapr      4096 Jan  3 23:01 vmstat.node107rhel72.out.4.gz

-rw-r--r--. 1 mapr mapr      8585 Jan  4 00:01 vmstat.node107rhel72.out.3.gz

-rw-r--r--. 1 mapr mapr      8192 Jan  4 01:01 vmstat.node107rhel72.out.2.gz

-rw-r--r--. 1 mapr mapr      4096 Jan  4 02:01 vmstat.node107rhel72.out.1

-rw-r--r--. 1 mapr mapr      4096 Jan  4 02:01 vmstat.node107rhel72.out

System Stats

To provides a quick snapshot of the CPU, memory,network and IO usage on the node. Attached are three scripts which can gather this information on a regular interval (SystemStats.sh) and will capture the output into output files under /opt/mapr/logs/. The SystemStats.sh script is used to start the data collection gathering and the SystemStatsStop.sh script is used to stop it. Ideally its useful to have the system stats available for investigating any system related issues to narrow down quickly if there was any resource constrain contributing to the issue .

The SystemStats.sh script can be started as below to put it to the background:

nohup ./SystemStats.sh > /opt/mapr/logs/SystemStats.out 2>/opt/mapr/logs/SystemStats.err &

Please start the SystemStats.sh script on the node XXX for X hrs, Run SystemStatsStop.sh to stop all of the data collection commands.

To collect the output files as well as the MapR logs into a single .tgz you can run "tar -czf /tmp/diagnostics.$HOSTNAME.tgz /opt/mapr/logs".

Finally to delete all the collected logs please run " sh CleanUpStats.sh "

SystemStats.sh

SystemStatsStop.sh

CleanUpStats.sh