Thursday, January 22, 2009

Linux memory utilitization and disk caching

People often come to me and say, we are running out of memory or I'm not sure who is using all the memory on this box especially on a linux box.

I think, it confuses people more when they run (top command) and see 90% of the system memory is in use.

Here are some of mis-conception/interpretation of results we see in top command.
Snapshot of top command:
top - 10:20:43 up 12 days, 23:08, 2 users, load average: 0.06, 0.21, 0.18
Tasks: 90 total, 1 running, 89 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.8% us, 0.1% sy, 0.0% ni, 98.8% id, 0.1% wa, 0.1% hi, 0.1% si
Mem: 16414396k total, 10941732k used, 5472664k free, 261608k buffers
Swap: 8388600k total, 0k used, 8388600k free, 7092524k cached
21900 app_user 16 0 5976m 2.0g 80m S 22 12.5 931:56.60 WebSphere
21964 app_user 16 0 785m 675m 4236 S 16 4.2 107:04.09 mysqld

Total memory: 16GB
Used memory: 10GB+
Free memory:5GB+

In the above example, Out of 16GB, all 10GB is not used by the application.
It also includes all memory that the kernel uses for block caching and stuff.

Here's the explaination of caching and how it works by one linux guru:
Suppose that you run "ls /". What the kernel does then (after some processing), is that it requests the hard disk sectors that make up the root directory, and then it makes the ls process sleep until those sectors are received. While the hard drive is looking up those sectors, the system does some other things that don't require hard drive access. The hard drive then puts the sectors that it found in your RAM, and reports back to the kernel that the operation completed. Then the kernel continues with the ls process, which outputs what you wanted to see. After that, the kernel has two choices:

1. It could discard the sectors loaded from the hard drive.
2. It could keep the sectors in memory.

Now, if we look carefully at option number 1, which at the first glance might seem like a good thing, since it apparently makes that memory usable again, we see that it would require those sectors to be loaded once again whenever a process wants to look in the root directory. Since many files that are opened are opened with absolute path names, there are many root directory accesses in the system. If the sectors would have to be loaded each time a process accesses the root directory, the system would be slow, since the hard drive is slow. Therefore, the kernel keeps the sectors in memory. The memory used this way is what you see in the "cached" entry. That allows for extremely fast disk access, since all sectors that have once been accessed don't have to be reloaded from the physical disk.

Now, you might think that that's an extreme waste of memory, but if you do, you're very wrong. You see, the block caches are low-priority pages, which essentially means that they are free to be used be processes, should they need the memory. Every time a process needs an extra memory page and there isn't already one free, a page used for block buffering is freed by the kernel and put into the process' addressing space for the process to use in whatever way it see fit. Together with some intelligently designed algorithms for choosing which cache page to be freed, this all allows for really fast disk access.

Now, that we figured out the total 10GB in use also includes kernel caching.
The question still remains about what is the actual application memory usage?

Try (free commands)
free -m -t
free -m -s 5
free -g -t
total used free shared buffers cached
Mem: 15 10 5 0 0 6
-/+ buffers/cache: 3 12
Swap: 7 0 7
Total: 23 10 13

We still see total 15/16GB, and 10GB in use.

You will notice, that the top row 'used' (10) value will almost always nearly match the top row mem 'total' value (15). Since Linux likes to use any spare memory to cache disk blocks shown in top row 'cached' value (6).

The key used figure to look at is the buffers/cache row used value (3). This is how much space your applications are currently using. For best performance, this number should be less than your total (15) memory. To prevent out of memory errors, it needs to be less than the total memory (15) and swap space (7).

If you wish to quickly see how much memory is free look at the buffers/cache row free value (12).
This is the total memory (15)
The actual used (3)
Free memory (15 - 3) = 12

Ok, Now we also know that actually the application is only using 3GB out of 16GB, instead of 10GB reported by top. We may need to find out who is using that 3GB memory.

Try ps command
ps aux
ps aux | awk '{print $4"\t"$1"\t"$11}' | sort -nr

snapshot of ps:
%MEM USER COMMAND
12.6 app_user startServer.sh
4.2 app_user /data/App_Server/mysql/bin/mysqld

To see only the memory resources occupied by each category of processes, such as Apache httpd, MySQL mysqld or Java, use the following command:

ps aux | awk '{print $4"\t"$11}' | sort | uniq -c | awk '{print $2" "$1" "$3}' | sort -nr

Now, if we calculate again. We have startServer.sh using 12.6% of 16GB is 2.01GB.
Similarly, mysql is taking about 4.2% of 16GB, is 670MB, as shown above by the top command.

Thursday, January 8, 2009

System monitoring using sar

System monitoring using sar

SAR(system activity report) is very useful to troubleshoot and pinpoint issues related to system performance.

Though it can be used to gather some useful data regarding system performance, the sar command can increase the system load that can exacerbate a pre-existing performance problem if the sampling frequency is high.


Step 1 - Collecting data using sar:
sar comes with almost all unix system, however you have enable it.exi

#crontab -e

# Collect measurements at 10-minute intervals
0,10,20,30,40,50 * * * * /usr/lib/sa/sa1
# Create daily reports and purge old files
0 0 * * * /usr/lib/sa/sa2 -A

In the above cronjob, there are two commands
the first command sa1, calls sadc to collect the performance data in binary log file and it is running every 10 mins.

The second command sa2,dumps the data from the binary log file into a text file and also deletes the files older than 7 days.

Step 2 - Extracting useful information:

Now that data is being collected, we have to extract useful information related to what we are trying to find out about the system performance at the exact time.

sar data are collected in /var/adm/sa ( you can verify this path in /usr/lib/sa/sa1 script )

You will see two kind of files in that directory, saXX and sarXX
saxx - binary file
sarxx - text file

xx - represents the day of the month.

You can also modify /usr/lib/sa/sa1 to change this format.

when you run just sar, it is going to read the latest saxx file created.

Now we need to filter the status of particular time period. In that case we point sar by specifying -f option to read that time period.

for example:
sar -f /var/adm/sa/sa08 -P ALL <-- All processors utilization
sar -f /var/adm/sa/sa08 -u -P 0,1 <-- Only 0 and 1 Processors
sar -f /var/adm/sa/sa08 -k <-- kernel activity
sar -f /var/adm/sa/sa08 -u <-- cpu utilization

Output:
AIX ibm66p2 3 5 00C9B0704C00 01/08/09

System configuration: lcpu=2 mode=Capped

00:01:09 %usr %sys %wio %idle physc
00:11:09 1 1 0 98 1.00
00:21:09 1 1 0 99 1.00
00:31:09 1 1 0 99 1.00
00:41:09 1 0 0 99 1.00
00:51:09 1 0 0 99 1.00
01:05:54 1 1 0 99 1.00
01:15:54 1 1 0 98 1.00

Output column meanings are at the bottom of this page.

sar -f /var/adm/sa/sa08 -d <-- disk IO utilization

Output:
AIX ibm66p2 3 5 00C9B0704C00 01/08/09

System configuration: lcpu=2 drives=4 mode=Capped

00:01:09 device %busy avque r+w/s Kbs/s avwait avserv
00:11:09 hdisk0 0 0.0 1 4 1.5 11.0
hdisk1 0 0.0 1 4 1.6 11.2
hdisk2 0 0.0 0 1 0.3 9.2
hdisk3 0 0.0 0 0 0.0 6.3
00:21:09 hdisk0 0 0.0 1 4 1.1 10.9
hdisk1 0 0.0 1 4 1.0 10.9
hdisk2 0 0.0 0 1 0.0 7.4
hdisk3 0 0.0 0 0 0.0 0.0


Also, I have seen there is option in Redhat/Solaris to monitor memory usage
sar -f /var/adm/sa/sa08 -r

which is not available in AIX 5.3

However you can use AIX topasout command, which I found is very useful for most of the cases.

Try reading the process recording files in /etc/perf/daily/ directory using topasout
topasout -R detailed -mem -i 5 -b 1000 -e 2300 /etc/perf/daily/xmwlm.090108
topasout -R summary -mem -i 5 -b 1000 -e 2300 /etc/perf/daily/xmwlm.090108

post process xmwlm recording, which is also called local recording
and topas recording, which is also called CEC (central electronic complex) recording

------------------------------------------------------------------------------------
Output columns meaning:
%usr: The percentage of time the CPU is spending on user processes, such as applications, shell scripts, or interacting with the user.

%sys: The percentage of time the CPU is spending executing kernel tasks. In this example, the number is high, because I was pulling data from the kernel's random number generator.

%wio: The percentage of time the CPU is waiting for input or output from a block device, such as a disk.

%idle: The percentage of time the CPU isn't doing anything useful.

kexit/s
Reports the number of kernel processes terminating per second.

kproc-ov/s
Reports the number of times kernel processes could not be created because of enforcement of process threshold limit.

ksched/s
Reports the number of kernel processes assigned to tasks per second.