Monday, September 21, 2015

Performance Issue / Hang issue

                                                Performance Issue / Hang issue

-First we can use vmstat to check system resource utilization and see which resource is scarce when hang is observed as below. If its due to Memory constrains or system swapping its easy to address by adding more memory to the server or putting cap on processes to utilize only dedicated amount of memory. Below blog gives detail on identifying system resource constrains step by step.

http://abizeradenwala.blogspot.com/2015/07/resource-utilization-to-quickly.html

- If we identify its certain process causing it then we can start with htop to identify if processes hogging most amount of resources have multiple threads running in them internally . Htop is usually not installed on the system by default and might need below package installed.

htop-1.0.1-2.el6.x86_64

Using htop, press t to get a nested tree of threads and specifically look at PID which seems to be hogging most amount of resources.

Alternatively to get list of all the threads and processes currently running on the system we would need to run below command to get all threads .  Running below ps as a cron job and poll the system every 5 mins and recreate the system hang can identify if there are huge number of threads which have been spunned up via java internally.

ps -eLf | grep 21917


UID        PID  PPID   LWP  C NLWP   STIME TTY           TIME     CMD
mapr     21887     1    21917  0   53           Sep11                   00:01:05   /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java



  -L  Show threads, possibly with LWP( light wt process) and NLWP ( Not light wt process ) columns

- If we do see some process continuously hogging CPU it would point us to some part of Code is not using CPU effectively or some threads are stuck in CPU and not making progress . We should run 
"kill -3 " every 1 sec for it to thread dump to standard out of the process and carefully review the thread dump with Developers to identify any issue in the code. This will usually show which thread is slow or not making any or significant progress during the time timeframe when issue was observed , this would help developers to put the fix in the code to avoid potential hang.

No comments:

Post a Comment