Thursday, January 8, 2009

System monitoring using sar

System monitoring using sar

SAR(system activity report) is very useful to troubleshoot and pinpoint issues related to system performance.

Though it can be used to gather some useful data regarding system performance, the sar command can increase the system load that can exacerbate a pre-existing performance problem if the sampling frequency is high.


Step 1 - Collecting data using sar:
sar comes with almost all unix system, however you have enable it.exi

#crontab -e

# Collect measurements at 10-minute intervals
0,10,20,30,40,50 * * * * /usr/lib/sa/sa1
# Create daily reports and purge old files
0 0 * * * /usr/lib/sa/sa2 -A

In the above cronjob, there are two commands
the first command sa1, calls sadc to collect the performance data in binary log file and it is running every 10 mins.

The second command sa2,dumps the data from the binary log file into a text file and also deletes the files older than 7 days.

Step 2 - Extracting useful information:

Now that data is being collected, we have to extract useful information related to what we are trying to find out about the system performance at the exact time.

sar data are collected in /var/adm/sa ( you can verify this path in /usr/lib/sa/sa1 script )

You will see two kind of files in that directory, saXX and sarXX
saxx - binary file
sarxx - text file

xx - represents the day of the month.

You can also modify /usr/lib/sa/sa1 to change this format.

when you run just sar, it is going to read the latest saxx file created.

Now we need to filter the status of particular time period. In that case we point sar by specifying -f option to read that time period.

for example:
sar -f /var/adm/sa/sa08 -P ALL <-- All processors utilization
sar -f /var/adm/sa/sa08 -u -P 0,1 <-- Only 0 and 1 Processors
sar -f /var/adm/sa/sa08 -k <-- kernel activity
sar -f /var/adm/sa/sa08 -u <-- cpu utilization

Output:
AIX ibm66p2 3 5 00C9B0704C00 01/08/09

System configuration: lcpu=2 mode=Capped

00:01:09 %usr %sys %wio %idle physc
00:11:09 1 1 0 98 1.00
00:21:09 1 1 0 99 1.00
00:31:09 1 1 0 99 1.00
00:41:09 1 0 0 99 1.00
00:51:09 1 0 0 99 1.00
01:05:54 1 1 0 99 1.00
01:15:54 1 1 0 98 1.00

Output column meanings are at the bottom of this page.

sar -f /var/adm/sa/sa08 -d <-- disk IO utilization

Output:
AIX ibm66p2 3 5 00C9B0704C00 01/08/09

System configuration: lcpu=2 drives=4 mode=Capped

00:01:09 device %busy avque r+w/s Kbs/s avwait avserv
00:11:09 hdisk0 0 0.0 1 4 1.5 11.0
hdisk1 0 0.0 1 4 1.6 11.2
hdisk2 0 0.0 0 1 0.3 9.2
hdisk3 0 0.0 0 0 0.0 6.3
00:21:09 hdisk0 0 0.0 1 4 1.1 10.9
hdisk1 0 0.0 1 4 1.0 10.9
hdisk2 0 0.0 0 1 0.0 7.4
hdisk3 0 0.0 0 0 0.0 0.0


Also, I have seen there is option in Redhat/Solaris to monitor memory usage
sar -f /var/adm/sa/sa08 -r

which is not available in AIX 5.3

However you can use AIX topasout command, which I found is very useful for most of the cases.

Try reading the process recording files in /etc/perf/daily/ directory using topasout
topasout -R detailed -mem -i 5 -b 1000 -e 2300 /etc/perf/daily/xmwlm.090108
topasout -R summary -mem -i 5 -b 1000 -e 2300 /etc/perf/daily/xmwlm.090108

post process xmwlm recording, which is also called local recording
and topas recording, which is also called CEC (central electronic complex) recording

------------------------------------------------------------------------------------
Output columns meaning:
%usr: The percentage of time the CPU is spending on user processes, such as applications, shell scripts, or interacting with the user.

%sys: The percentage of time the CPU is spending executing kernel tasks. In this example, the number is high, because I was pulling data from the kernel's random number generator.

%wio: The percentage of time the CPU is waiting for input or output from a block device, such as a disk.

%idle: The percentage of time the CPU isn't doing anything useful.

kexit/s
Reports the number of kernel processes terminating per second.

kproc-ov/s
Reports the number of times kernel processes could not be created because of enforcement of process threshold limit.

ksched/s
Reports the number of kernel processes assigned to tasks per second.

No comments: