[SOLVED] 6.2: Where has all the memory gone?

General support questions
ChairmanKaga
Posts: 6
Joined: 2010/02/02 18:35:17

[SOLVED] 6.2: Where has all the memory gone?

Post by ChairmanKaga » 2012/01/06 19:16:16

On a fresh 6.2 install, I keep running into out of memory errors after about 24 hours of uptime.

[code]
total used free shared buffers cached
Mem: 1030888 753000 277888 0 2572 12588
-/+ buffers/cache: [b][u]737840[/u][/b] 293048
Swap: 0 0 0
[/code]

This is basically a minimal install at the moment (building it into a development system for myself) -- there is virtually [i]nothing[/i] running on this machine except for a couple daemons I'm testing (Kerberos is the main one). It starts out fine, but then overnight (when even less is going on) the buffer count mysteriously balloons to what you see above (or higher), and I have to reboot it almost every morning because nothing seems to remedy this and I can't run anything besides basic shell commands.

There was [url=https://www.centos.org/modules/newbb/viewtopic.php?topic_id=32988]a previous thread[/url] with a similar problem, but none of the remedies in that thread have worked. "echo 3 >/proc/sys/vm/drop_caches" only regains about 30M or so. dentry does not appear to be the issue in this case:

[code]
Active / Total Objects (% used) : 874616 / 888699 (98.4%)
Active / Total Slabs (% used) : 8006 / 8006 (100.0%)
Active / Total Caches (% used) : 94 / 182 (51.6%)
Active / Total Size (% used) : 27158.44K / 28482.08K (95.4%)
Minimum / Average / Maximum Object : 0.01K / 0.03K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
333576 333173 99% 0.03K 2952 113 11808K size-32
480704 480587 99% 0.02K 2368 203 9472K avtab_node
5742 5607 97% 0.34K 522 11 2088K inode_cache
8903 7399 83% 0.13K 307 29 1228K dentry
372 367 98% 2.00K 186 2 744K size-2048
9100 9072 99% 0.07K 182 50 728K sysfs_dir_cache
114 114 100% 3.00K 57 2 456K biovec-256
9492 7365 77% 0.04K 113 84 452K selinux_inode_security
6834 1059 15% 0.05K 102 67 408K buffer_head
576 380 65% 0.63K 96 6 384K ext4_inode_cache
[/code]

Like I said, there is virtually nothing running on this box. Definitely not enough to be consuming 70% of the RAM.

[code]
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Jan04 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jan04 0:00 \_ [migration/0]
root 4 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Jan04 0:00 \_ [migration/0]
root 6 0.0 0.0 0 0 ? S Jan04 0:00 \_ [watchdog/0]
root 7 0.0 0.0 0 0 ? S Jan04 0:00 \_ [events/0]
root 8 0.0 0.0 0 0 ? S Jan04 0:00 \_ [cpuset]
root 9 0.0 0.0 0 0 ? S Jan04 0:00 \_ [khelper]
root 10 0.0 0.0 0 0 ? S Jan04 0:00 \_ [netns]
root 11 0.0 0.0 0 0 ? S Jan04 0:00 \_ [async/mgr]
root 12 0.0 0.0 0 0 ? S Jan04 0:00 \_ [pm]
root 13 0.0 0.0 0 0 ? S Jan04 0:00 \_ [sync_supers]
root 14 0.0 0.0 0 0 ? S Jan04 0:00 \_ [bdi-default]
root 15 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kintegrityd/0]
root 16 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kblockd/0]
root 17 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kacpid]
root 18 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kacpi_notify]
root 19 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kacpi_hotplug]
root 20 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ata/0]
root 21 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ata_aux]
root 22 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ksuspend_usbd]
root 23 0.0 0.0 0 0 ? S Jan04 0:00 \_ [khubd]
root 24 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kseriod]
root 25 0.0 0.0 0 0 ? S Jan04 0:00 \_ [md/0]
root 26 0.0 0.0 0 0 ? S Jan04 0:00 \_ [md_misc/0]
root 27 0.0 0.0 0 0 ? S Jan04 0:00 \_ [khungtaskd]
root 28 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kswapd0]
root 29 0.0 0.0 0 0 ? SN Jan04 0:00 \_ [ksmd]
root 30 0.0 0.0 0 0 ? S Jan04 0:00 \_ [aio/0]
root 31 0.0 0.0 0 0 ? S Jan04 0:00 \_ [crypto/0]
root 36 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kthrotld/0]
root 37 0.0 0.0 0 0 ? S Jan04 0:00 \_ [pciehpd]
root 39 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kpsmoused]
root 40 0.0 0.0 0 0 ? S Jan04 0:00 \_ [usbhid_resumer]
root 70 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kstriped]
root 270 0.0 0.0 0 0 ? S Jan04 0:00 \_ [scsi_eh_0]
root 271 0.0 0.0 0 0 ? S Jan04 0:00 \_ [scsi_eh_1]
root 280 0.0 0.0 0 0 ? S Jan04 0:00 \_ [scsi_eh_2]
root 281 0.0 0.0 0 0 ? S Jan04 0:00 \_ [vmw_pvscsi_wq_2]
root 333 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 347 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-0-8]
root 348 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 607 0.0 0.0 0 0 ? S Jan04 0:06 \_ [vmmemctl]
root 719 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 722 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 725 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 728 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 729 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 731 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 782 0.0 0.0 0 0 ? S Jan04 0:00 \_ [flush-253:3]
root 803 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/sda1-8]
root 804 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 805 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-2-8]
root 806 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 807 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-4-8]
root 808 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 809 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-1-8]
root 810 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 811 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-3-8]
root 812 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 813 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-5-8]
root 814 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 815 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-6-8]
root 816 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 850 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kauditd]
root 1 0.0 0.1 2864 1280 ? Ss Jan04 0:02 /sbin/init
root 422 0.0 0.1 2884 1128 ? S<s Jan04 0:00 /sbin/udevd -d
root 1248 0.0 0.1 3400 1784 ? S< Jan04 0:00 \_ /sbin/udevd -d
root 1249 0.0 0.1 3404 1784 ? S< Jan04 0:00 \_ /sbin/udevd -d
root 1072 0.0 0.0 12904 764 ? S<sl Jan04 0:00 auditd
root 1090 0.0 0.0 2028 516 ? Ss Jan04 0:00 /sbin/portreserve
root 1097 0.0 0.1 35536 1360 ? Sl Jan04 0:02 /sbin/rsyslogd -i /var/run/syslogd.pid -c 4
root 1131 0.0 0.0 8452 944 ? Ss Jan04 0:00 /usr/sbin/sshd
root 1279 0.0 0.3 11396 3280 ? S Jan04 0:01 \_ sshd: ck [priv]
ck 1282 0.0 0.1 11512 1772 ? S Jan04 0:04 | \_ sshd: ck@pts/1
ck 1283 0.0 0.1 5228 1860 pts/1 Ss Jan04 0:01 | \_ -bash
root 18201 0.0 0.1 5388 1476 pts/1 S 12:43 0:00 | \_ su -
root 18205 0.1 0.1 5096 1680 pts/1 S 12:43 0:00 | \_ -bash
root 18290 0.0 0.0 4872 1020 pts/1 R+ 12:53 0:00 | \_ ps auwfx
root 15244 0.0 0.3 11396 3276 ? S Jan04 0:01 \_ sshd: ck [priv]
ck 15247 0.0 0.1 11544 1744 ? S Jan04 0:04 \_ sshd: ck@pts/0
ck 15248 0.0 0.1 5228 1848 pts/0 Ss+ Jan04 0:01 \_ -bash
root 1207 0.0 0.2 12420 2464 ? Ss Jan04 0:00 /usr/libexec/postfix/master
postfix 1216 0.0 0.2 12564 2368 ? S Jan04 0:00 \_ qmgr -l -t fifo -u
postfix 18179 0.0 0.2 12496 2440 ? S 12:01 0:00 \_ pickup -l -t fifo -u
root 1217 0.0 0.1 5888 1268 ? Ss Jan04 0:02 crond
root 1226 0.0 0.2 22884 2920 ? Ss Jan04 0:01 smbd -D
root 1230 0.0 0.1 22884 1396 ? S Jan04 0:00 \_ smbd -D
root 1242 0.0 0.0 1980 508 tty2 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty2
root 1244 0.0 0.0 1980 504 tty3 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty3
root 1246 0.0 0.0 1980 508 tty4 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty4
root 1250 0.0 0.0 1980 512 tty5 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty5
root 1252 0.0 0.0 1980 512 tty6 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty6
root 1532 0.0 0.4 7532 4140 ? Ss Jan04 0:00 /usr/sbin/kadmind -P /var/run/kadmind.pid
root 1547 0.0 0.7 10312 7496 ? Ss Jan04 0:02 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid
[/code]

This is a test/dev system, but suffice it to say this is making me a bit nervous to attempt any sort of production deployment.

So why does the kernel seem to be eating all the RAM?

[code]
Linux dev-box 2.6.32-220.2.1.el6.i686 #1 SMP Thu Dec 22 18:50:52 GMT 2011 i686 i686 i386 GNU/Linux
[/code]

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

[SOLVED] 6.2: Where has all the memory gone?

Post by pschaff » 2012/01/07 03:57:28

Are you actually having problems like OOM killer, or are you just seeing memory used for buffers. If the latter then don't worry about it. Unused memory is wasted memory. You haven't got much and might best let the kernel manage the scarce resource.

We do keep seeing mysterious problems with kernel-2.6.32-220.2.1.el6. If having actual problems you might try falling back to 2.6.32-220.el6, or if that still has problems, kernel-2.6.32-131.12.1.el6.

You say you are testing. Are you running only standard packages, or some test versions? If the latter, perhaps you have a memory leak.

Adding some swap, or better yet some RAM, might also be worth a try.

pza81
Posts: 33
Joined: 2007/07/10 08:02:35
Contact:

Re: 6.2: Where has all the memory gone?

Post by pza81 » 2012/01/07 07:29:54

Yes, add some swap. Or, up your overcommit ratio so that it allows for 100% of memory rather than the default 50%.
echo 100 > /proc/sys/vm/overcommit_ratio

Can you post the output of /proc/meminfo?

ChairmanKaga
Posts: 6
Joined: 2010/02/02 18:35:17

Re: 6.2: Where has all the memory gone?

Post by ChairmanKaga » 2012/01/09 17:52:43

[quote]
Are you actually having problems like OOM killer, or are you just seeing memory used for buffers. If the latter then don't worry about it. Unused memory is wasted memory. You haven't got much and might best let the kernel manage the scarce resource.
[/quote]

No, we're actually getting a lot of programs dying with ENOMEM (took forever to get our application packages built because rpmbuild kept choking).

I know about Linux's buffering behavior and already discounted that :)

[quote]
We do keep seeing mysterious problems with kernel-2.6.32-220.2.1.el6. If having actual problems you might try falling back to 2.6.32-220.el6, or if that still has problems, kernel-2.6.32-131.12.1.el6.
[/quote]

May try that today.

[quote]
You say you are testing. Are you running only standard packages, or some test versions? If the latter, perhaps you have a memory leak.
[/quote]

Right now it's just standard packages. Haven't even gotten to the point where I can really test our internal code on it because of this issue.

[quote]
Yes, add some swap.
[/quote]

We're trying to deliberately avoid swap. Our final physical box buildout will have 8GB minimum, but more importantly, we're working on PCI DSS compliance and trying to mitigate the risk of having cardholder data written to disk because of unexpected VMM behavior.

I suppose I could bump this test VM to 2GB, but that feels like it's just masking the problem. We've got old CentOS 4.x boxes in here (yes, we're upgrading them) that are perfectly content with 256-512MB. Hell, I don't recall having a problem like this in testing on 6.0/6.1, so I'm inclined to think that this is a new kernel bug.

[quote]
Can you post the output of /proc/meminfo?
[/quote]

Here's what I walked in to this morning:

[code]
total used free shared buffers cached
Mem: 1030888 867828 163060 0 47160 65892
-/+ buffers/cache: 754776 276112
Swap: 0 0 0

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Jan04 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jan04 0:00 \_ [migration/0]
root 4 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Jan04 0:00 \_ [migration/0]
root 6 0.0 0.0 0 0 ? S Jan04 0:00 \_ [watchdog/0]
root 7 0.0 0.0 0 0 ? S Jan04 0:00 \_ [events/0]
root 8 0.0 0.0 0 0 ? S Jan04 0:00 \_ [cpuset]
root 9 0.0 0.0 0 0 ? S Jan04 0:00 \_ [khelper]
root 10 0.0 0.0 0 0 ? S Jan04 0:00 \_ [netns]
root 11 0.0 0.0 0 0 ? S Jan04 0:00 \_ [async/mgr]
root 12 0.0 0.0 0 0 ? S Jan04 0:00 \_ [pm]
root 13 0.0 0.0 0 0 ? S Jan04 0:00 \_ [sync_supers]
root 14 0.0 0.0 0 0 ? S Jan04 0:00 \_ [bdi-default]
root 15 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kintegrityd/0]
root 16 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kblockd/0]
root 17 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kacpid]
root 18 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kacpi_notify]
root 19 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kacpi_hotplug]
root 20 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ata/0]
root 21 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ata_aux]
root 22 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ksuspend_usbd]
root 23 0.0 0.0 0 0 ? S Jan04 0:00 \_ [khubd]
root 24 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kseriod]
root 25 0.0 0.0 0 0 ? S Jan04 0:00 \_ [md/0]
root 26 0.0 0.0 0 0 ? S Jan04 0:00 \_ [md_misc/0]
root 27 0.0 0.0 0 0 ? S Jan04 0:00 \_ [khungtaskd]
root 28 0.0 0.0 0 0 ? S Jan04 0:03 \_ [kswapd0]
root 29 0.0 0.0 0 0 ? SN Jan04 0:00 \_ [ksmd]
root 30 0.0 0.0 0 0 ? S Jan04 0:00 \_ [aio/0]
root 31 0.0 0.0 0 0 ? S Jan04 0:00 \_ [crypto/0]
root 36 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kthrotld/0]
root 37 0.0 0.0 0 0 ? S Jan04 0:00 \_ [pciehpd]
root 39 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kpsmoused]
root 40 0.0 0.0 0 0 ? S Jan04 0:00 \_ [usbhid_resumer]
root 70 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kstriped]
root 270 0.0 0.0 0 0 ? S Jan04 0:00 \_ [scsi_eh_0]
root 271 0.0 0.0 0 0 ? S Jan04 0:00 \_ [scsi_eh_1]
root 280 0.0 0.0 0 0 ? S Jan04 0:00 \_ [scsi_eh_2]
root 281 0.0 0.0 0 0 ? S Jan04 0:00 \_ [vmw_pvscsi_wq_2]
root 333 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 347 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-0-8]
root 348 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 607 0.0 0.0 0 0 ? S Jan04 0:08 \_ [vmmemctl]
root 719 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 722 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 725 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 728 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 729 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 731 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kdmflush]
root 782 0.0 0.0 0 0 ? S Jan04 0:00 \_ [flush-253:3]
root 803 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/sda1-8]
root 804 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 805 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-2-8]
root 806 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 807 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-4-8]
root 808 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 809 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-1-8]
root 810 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 811 0.0 0.0 0 0 ? S Jan04 0:06 \_ [jbd2/dm-3-8]
root 812 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 813 0.0 0.0 0 0 ? S Jan04 0:01 \_ [jbd2/dm-5-8]
root 814 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 815 0.0 0.0 0 0 ? S Jan04 0:00 \_ [jbd2/dm-6-8]
root 816 0.0 0.0 0 0 ? S Jan04 0:00 \_ [ext4-dio-unwrit]
root 850 0.0 0.0 0 0 ? S Jan04 0:00 \_ [kauditd]
root 18912 0.0 0.0 0 0 ? S Jan06 0:00 \_ [bluetooth]
root 21981 0.0 0.0 0 0 ? S 11:44 0:00 \_ [flush-253:6]
root 1 0.0 0.1 2864 1152 ? Ss Jan04 0:02 /sbin/init
root 422 0.0 0.1 2884 1112 ? S<s Jan04 0:00 /sbin/udevd -d
root 1248 0.0 0.1 3400 1784 ? S< Jan04 0:00 \_ /sbin/udevd -d
root 1249 0.0 0.1 3404 1760 ? S< Jan04 0:00 \_ /sbin/udevd -d
root 1072 0.0 0.0 12904 768 ? S<sl Jan04 0:00 auditd
root 1090 0.0 0.0 2028 516 ? Ss Jan04 0:00 /sbin/portreserve
root 1097 0.0 0.1 35536 1428 ? Sl Jan04 0:08 /sbin/rsyslogd -i /var/run/syslogd.pid -c 4
root 1131 0.0 0.0 8452 944 ? Ss Jan04 0:00 /usr/sbin/sshd
root 1279 0.0 0.3 11396 3292 ? S Jan04 0:02 \_ sshd: ck [priv]
ck 1282 0.0 0.1 11512 1792 ? S Jan04 0:10 | \_ sshd: ck@pts/1
ck 1283 0.0 0.1 5228 1860 pts/1 Ss Jan04 0:01 | \_ -bash
ck 21983 0.0 0.0 4872 1024 pts/1 R+ 11:46 0:00 | \_ ps auwfx
root 15244 0.0 0.3 11396 3288 ? S Jan04 0:02 \_ sshd: ck [priv]
ck 15247 0.0 0.1 11544 1744 ? S Jan04 0:10 | \_ sshd: ck@pts/0
ck 15248 0.0 0.1 5228 1848 pts/0 Ss Jan04 0:01 | \_ -bash
ck 19116 0.0 0.5 14268 5288 pts/0 S+ Jan06 0:02 | \_ mysql -h testdb01 -p
root 18775 0.0 0.3 11396 3280 ? S Jan06 0:01 \_ sshd: ck [priv]
ck 18779 0.0 0.1 11512 1756 ? S Jan06 0:05 \_ sshd: ck@pts/2
ck 18780 0.0 0.1 5228 1784 pts/2 Ss Jan06 0:00 \_ -bash
root 19145 0.0 0.1 5388 1472 pts/2 S Jan06 0:00 \_ su -
root 19150 0.0 0.1 5200 1704 pts/2 S+ Jan06 0:00 \_ -bash
root 1207 0.0 0.2 12420 2464 ? Ss Jan04 0:01 /usr/libexec/postfix/master
postfix 1216 0.0 0.2 12564 2368 ? S Jan04 0:00 \_ qmgr -l -t fifo -u
postfix 21980 0.0 0.2 12496 2440 ? S 11:42 0:00 \_ pickup -l -t fifo -u
root 1217 0.0 0.1 5888 1268 ? Ss Jan04 0:04 crond
root 1226 0.0 0.2 22884 2512 ? Ss Jan04 0:04 smbd -D
root 1230 0.0 0.1 22884 1376 ? S Jan04 0:00 \_ smbd -D
root 19109 0.0 0.4 23436 4216 ? S Jan06 0:06 \_ smbd -D
root 1242 0.0 0.0 1980 508 tty2 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty2
root 1244 0.0 0.0 1980 504 tty3 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty3
root 1246 0.0 0.0 1980 508 tty4 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty4
root 1250 0.0 0.0 1980 512 tty5 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty5
root 1252 0.0 0.0 1980 512 tty6 Ss+ Jan04 0:00 /sbin/mingetty /dev/tty6
root 1532 0.0 0.4 7532 5032 ? Ss Jan04 0:00 /usr/sbin/kadmind -P /var/run/kadmind.pid
root 1547 0.0 0.7 10312 7492 ? Ss Jan04 0:04 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid

MemTotal: 1030888 kB
MemFree: 163052 kB
Buffers: 47068 kB
Cached: 65892 kB
SwapCached: 0 kB
Active: 62716 kB
Inactive: 76436 kB
Active(anon): 1880 kB
Inactive(anon): 24512 kB
Active(file): 60836 kB
Inactive(file): 51924 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 141256 kB
HighFree: 66160 kB
LowTotal: 889632 kB
LowFree: 96892 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 26216 kB
Mapped: 9380 kB
Shmem: 192 kB
Slab: 40820 kB
SReclaimable: 12388 kB
SUnreclaim: 28432 kB
KernelStack: 864 kB
PageTables: 1764 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 515444 kB
Committed_AS: 89792 kB
VmallocTotal: 122880 kB
VmallocUsed: 3816 kB
VmallocChunk: 100124 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10232 kB
DirectMap2M: 897024 kB
[/code]

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: 6.2: Where has all the memory gone?

Post by pschaff » 2012/01/11 11:08:06

[quote]
ChairmanKaga wrote:
[quote]
Yes, add some swap.
[/quote]

We're trying to deliberately avoid swap. Our final physical box buildout will have 8GB minimum, but more importantly, we're working on PCI DSS compliance and trying to mitigate the risk of having cardholder data written to disk because of unexpected VMM behavior.[/quote]
I don't see what the inability to build due to insufficient memory has to do with the behavior on the final system with more memory. If adding swap gets the build done, you can later remove it.

[quote]
I suppose I could bump this test VM to 2GB, but that feels like it's just masking the problem. We've got old CentOS 4.x boxes in here (yes, we're upgrading them) that are perfectly content with 256-512MB. Hell, I don't recall having a problem like this in testing on 6.0/6.1, so I'm inclined to think that this is a new kernel bug.
[/quote]
Don't think you mentioned that it was a VM before, so adding memory seems like an easy test. As far as CentOS-4 - it is well known that CentOS-6 requires more memory than 4. See the upstream [url=http://www.redhat.com/rhel/compare/]comparison page[/url]. It could also be a kernel bug - another easy test to run - but I suspect that the primary issue is insufficient memory.

ChairmanKaga
Posts: 6
Joined: 2010/02/02 18:35:17

Re: 6.2: Where has all the memory gone?

Post by ChairmanKaga » 2012/01/12 02:04:56

[quote]I don't see what the inability to build due to insufficient memory has to do with the behavior on the final system with more memory.[/quote]

I guess I'm making the somewhat parallel argument that I don't see what lacking swap has to do with the memory apparently being lost to a black hole.

For comparison, here's a another box (64-bit, 2.6.32-220.2.1.el6.x86_64, non-VM) with 4GB RAM (which has been up for days now without exhibiting the issue):

[code]
total used free shared buffers cached
Mem: 3596408 [b]804960[/b] 2791448 0 81780 530848
-/+ buffers/cache: 192332 3404076
Swap: 0 0 0

MemTotal: 3596408 kB
MemFree: 2791448 kB
[b]Buffers: 81780 kB[/b]
[b]Cached: 530848 kB[/b]
SwapCached: 0 kB
Active: 252272 kB
Inactive: 373088 kB
Active(anon): 12872 kB
Inactive(anon): 3264 kB
Active(file): 239400 kB
Inactive(file): 369824 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 12748 kB
[b]Mapped: 6204 kB[/b]
[b]Shmem: 3400 kB[/b]
[b]Slab: 124492 kB[/b]
SReclaimable: 64708 kB
SUnreclaim: 59784 kB
[b]KernelStack: 1288 kB[/b]
[b]PageTables: 2076 kB[/b]
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1798204 kB
Committed_AS: 71728 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 351060 kB
VmallocChunk: 34359370500 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 11904 kB
DirectMap2M: 3649536 kB
[/code]

If I add up the stuff that I know isn't overlapping (the bolded items) I at least get within some reasonable epsilon of the used memory that free reports.

Add up the same numbers from the 32-bit VM in the earlier post...it's not even [i]remotely[/i] close. Something on the order of 700MB entirely unaccounted for. And I can't even start debugging the issue, because apparently, [i]nothing[/i] is using that memory. If I kill every process on the server save init, I still don't get that memory back unless I completely reboot.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: 6.2: Where has all the memory gone?

Post by pschaff » 2012/01/12 12:15:21

[quote]
ChairmanKaga wrote:
...
I guess I'm making the somewhat parallel argument that I don't see what lacking swap has to do with the memory apparently being lost to a black hole.[/quote]
Memory usage may depend on the workload, specific things running (perhaps with memory leaks) and the kernel version. If you add memory to the VM and it still gets "eaten" or falls into a black hole, then that might be evidence of a memory leak, or a kernel issue.

[quote]
For comparison, here's a another box (64-bit, 2.6.32-220.2.1.el6.x86_64, non-VM) with 4GB RAM (which has been up for days now without exhibiting the issue):
....
If I add up the stuff that I know isn't overlapping (the bolded items) I at least get within some reasonable epsilon of the used memory that free reports.

Add up the same numbers from the 32-bit VM in the earlier post...it's not even [i]remotely[/i] close. Something on the order of 700MB entirely unaccounted for. And I can't even start debugging the issue, because apparently, [i]nothing[/i] is using that memory. If I kill every process on the server save init, I still don't get that memory back unless I completely reboot.[/quote]
Has the problematic system been updated with all current errata packages, including the kernel?

sbergman
Posts: 20
Joined: 2011/09/30 03:45:48

Re: 6.2: Where has all the memory gone?

Post by sbergman » 2012/01/12 18:07:47

I'll keep this short, because I'm more or less a newbie regarding KVM. But could this be related to the balloon driver? I've never done a config that supported it, so I'm not sure how it looks from the guest OS's standpoint, or if it is enabled by default.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: 6.2: Where has all the memory gone?

Post by TrevorH » 2012/01/12 18:27:01

Another data point: here's a 32 bit CentOS 6 system that's been up about 22.5 days

[code]
total used free shared buffers cached
Mem: 1022248 564924 457324 0 46676 408892
-/+ buffers/cache: 109356 912892
Swap: 0 0 0
[/code]

That doesn't seem to have the same problem to me?

ChairmanKaga
Posts: 6
Joined: 2010/02/02 18:35:17

Re: 6.2: Where has all the memory gone?

Post by ChairmanKaga » 2012/01/12 20:34:22

[quote]I'll keep this short, because I'm more or less a newbie regarding KVM. But could this be related to the balloon driver? I've never done a config that supported it, so I'm not sure how it looks from the guest OS's standpoint, or if it is enabled by default.[/quote]

Actually, this is on VMware ESXi, but...I think you just nailed it.

Had our VMware admin look into this...guess how much memory the host showed the balloon driver using? Sure enough, about 700MB. (Now [i]why[/i] it decided to balloon is anyone's guess...the host is nowhere near capacity.)

Now I'm rather annoyed, because vmmemctl doesn't indicate this usage [i]anywhere[/i] inside the guest. If it did, I wouldn't have wasted everyone's time. From the guest's point of view, the memory is just [i]not there[/i]...used, but not allocated by anything visible. I hope KVM's balloon driver is smarter than that.

[quote]
Another data point: here's a 32 bit CentOS 6 system that's been up about 22.5 days

[code]
total used free shared buffers cached
Mem: 1022248 564924 457324 0 46676 408892
-/+ buffers/cache: 109356 912892
Swap: 0 0 0
[/code]

That doesn't seem to have the same problem to me?[/quote]

No, this looks normal.

Most of your used RAM is being used as disk cache by the kernel (see the buffers/cached numbers off to the right). This memory will be reclaimed as needed for applications.

The second line of free shows your [i]real[/i] used/free memory for processes.

Post Reply