Friday, July 3, 2015

System Resource utilization

                         System Resource utilization



  To quickly overview of the system you would need a tool to give you some system stats which you can read and interpret which resource is causing the bottleneck, Vmstat is good tool which reports information about processes, memory, swap I/O, block IO, system, and cpu activity in just one line.   There are other tool which report system utilization stats as well but reason i prefer vmstat is because the output of vmstat command is easy to read and can be used effectively for preliminary check to help identify any system bottlenecks but later to gain more insight into suspected issues, a different kind of tool is required i.e a tool capable of more in-depth data collection for analysis.
  In this post we will talk about only Vmstat and what each column represents to effectively determine if issue is due to High CPU utilization/ Disk IO / system Swapping etc. 


Here is an output of vmstat command from my test node:

$ vmstat 1 3            ( Here I get stats every 1 second for 3 times only)



[root@ip-10-128-160-140 ~]# vmstat 1 3
procs -----------memory----------         ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff             cache   si   so     bi    bo     in   cs    us sy id wa st
 0  0      0 23940396 178804 5127788    0    0     0     5    3      6     0 0 100  0  0
 0  0      0 23940396 178804 5127788    0    0     0     0  161  278   0  0 100  0  0
 0  0      0 23940396 178804 5127788    0    0     0    32  374  329  1  1 98  0  0
[root@ip-10-128-160-140 ~]#

  

The fist line lists six different categories for which stats will be displayed. The consecutive lines gives all data you need to interpret if you are running into any system level bottlenecks. All the data collected is in "kb" by default.

  Below info in from "man vmstat" and self explanatory which gives details of every column for which stats are collected.



FIELD DESCRIPTION FOR VM MODE
   Procs
       r: The number of processes waiting for run time.
       b: The number of processes in uninterruptible sleep.

   Memory
       swpd: the amount of virtual memory used.
       free: the amount of idle memory.
       buff: the amount of memory used as buffers.
       cache: the amount of memory used as cache.
       inact: the amount of inactive memory. (-a option)
       active: the amount of active memory. (-a option)

   Swap
       si: Amount of memory swapped in from disk (/s).
       so: Amount of memory swapped to disk (/s).

   IO
       bi: Blocks received from a block device (blocks/s).
       bo: Blocks sent to a block device (blocks/s).

   System
       in: The number of interrupts per second, including the clock.
       cs: The number of context switches per second.

   CPU
       These are percentages of total CPU time.
       us: Time spent running non-kernel code. (user time, including nice time)
       sy: Time spent running kernel code. (system time)
       id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
       wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
       st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.


  With systems who have larger memory its worth running the command with "-S" option and specifying "M" which would capture stats in "MB" which is more human readable.



[root@ip-10-128-160-140 ~]# vmstat -S M 1 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so       bi    bo   in   cs      us sy id wa st
 0  0      0  23386    174   5000    0    0      0     5      3    6       0  0 100  0  0
 0  0      0  23386    174   5000    0    0      0     0     262  299  0  0 100  0  0
 0  0      0  23386    174   5000    0    0      0     0     140  256  0  0 100  0  0
[root@ip-10-128-160-140 ~]#

  
With above info a system administrators can identify system bottlenecks or atleast get a data point on where he needs to dig deeper to get to the RC of the performance problem they are troubleshooting. I plan to write another post with examples where i will manually stimulate system bottlenecks and run vmstat in parallel to get data points which wold point to the problem .


No comments:

Post a Comment