Trouble tracking down a memory problem

Support for webhosts that use CentOS
Post Reply
natediggs
Posts: 27
Joined: 2008/01/26 18:03:32

Trouble tracking down a memory problem

Post by natediggs » 2010/07/20 14:52:05

Hi Everyone,

I've got a CentOS 5.5 server deployed hosting a webapp that runs Apache (worker), PostgreSQL, and Python for a django web application. We're having a pretty big memory problem with this system. The system is a dual core Intel processor with 4GBs of RAM. It's getting about 300 hits per day. Here is the current memory usage:

[code]
total used free shared buffers cached
Mem: 3875 3796 78 0 173 3108
-/+ buffers/cache: 515 3360
Swap: 5887 0 5887
[/code]
Restarting the server frees up tons of memory but as the system runs, memory creeps up until it has to be restarted.

Based on a ps aux it doesn't appear that any process, especially the web server processes, is using a ton of memory.

[code]
root 1 0.0 0.0 10348 696 ? Ss Jul06 0:01 init [5]
root 2 0.0 0.0 0 0 ? S< Jul06 0:00 [migration/0]
root 3 0.0 0.0 0 0 ? SN Jul06 0:00 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S< Jul06 0:00 [watchdog/0]
root 5 0.0 0.0 0 0 ? S< Jul06 0:00 [migration/1]
root 6 0.0 0.0 0 0 ? SN Jul06 0:00 [ksoftirqd/1]
root 7 0.0 0.0 0 0 ? S< Jul06 0:00 [watchdog/1]
root 8 0.0 0.0 0 0 ? S< Jul06 0:00 [events/0]
root 9 0.0 0.0 0 0 ? S< Jul06 0:00 [events/1]
root 10 0.0 0.0 0 0 ? S< Jul06 0:00 [khelper]
root 51 0.0 0.0 0 0 ? S< Jul06 0:00 [kthread]
root 56 0.0 0.0 0 0 ? S< Jul06 0:00 [kblockd/0]
root 57 0.0 0.0 0 0 ? S< Jul06 0:00 [kblockd/1]
root 58 0.0 0.0 0 0 ? S< Jul06 0:00 [kacpid]
root 163 0.0 0.0 0 0 ? S< Jul06 0:00 [cqueue/0]
root 164 0.0 0.0 0 0 ? S< Jul06 0:00 [cqueue/1]
root 167 0.0 0.0 0 0 ? S< Jul06 0:00 [khubd]
root 169 0.0 0.0 0 0 ? S< Jul06 0:00 [kseriod]
root 243 0.0 0.0 0 0 ? S Jul06 0:00 [khungtaskd]
root 245 0.0 0.0 0 0 ? S Jul06 0:00 [pdflush]
root 246 0.0 0.0 0 0 ? S< Jul06 0:03 [kswapd0]
root 247 0.0 0.0 0 0 ? S< Jul06 0:00 [aio/0]
root 248 0.0 0.0 0 0 ? S< Jul06 0:00 [aio/1]
root 390 0.0 0.0 0 0 ? S< Jul06 0:00 [kpsmoused]
root 431 0.0 0.0 0 0 ? S< Jul06 0:00 [mpt_poll_0]
root 432 0.0 0.0 0 0 ? S< Jul06 0:00 [mpt/0]
root 433 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_0]
root 440 0.0 0.0 0 0 ? S< Jul06 0:00 [ata/0]
root 441 0.0 0.0 0 0 ? S< Jul06 0:00 [ata/1]
root 442 0.0 0.0 0 0 ? S< Jul06 0:00 [ata_aux]
root 446 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_1]
root 447 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_2]
root 448 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_3]
root 449 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_4]
root 450 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_5]
root 451 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_6]
root 455 0.0 0.0 0 0 ? S< Jul06 0:00 [kstriped]
root 468 0.0 0.0 0 0 ? S< Jul06 0:00 [ksnapd]
root 483 0.0 0.0 0 0 ? S< Jul06 0:18 [kjournald]
root 509 0.0 0.0 0 0 ? S< Jul06 0:00 [kauditd]
root 542 0.0 0.0 13032 1208 ? S<s Jul06 0:00 /sbin/udevd -d
root 1482 0.0 0.0 0 0 ? S< Jul06 0:00 [kmpathd/0]
root 1483 0.0 0.0 0 0 ? S< Jul06 0:00 [kmpathd/1]
root 1484 0.0 0.0 0 0 ? S< Jul06 0:00 [kmpath_handlerd]
root 1511 0.0 0.0 0 0 ? S< Jul06 0:00 [kjournald]
root 2137 0.0 0.0 92860 860 ? S<sl Jul06 0:02 auditd
root 2139 0.0 0.0 81800 892 ? S<sl Jul06 0:00 /sbin/audispd
root 2161 0.0 0.0 5908 604 ? Ss Jul06 0:04 syslogd -m 0
root 2164 0.0 0.0 3804 448 ? Ss Jul06 0:00 klogd -x
root 2204 0.0 0.0 0 0 ? S< Jul06 0:00 [kondemand/0]
root 2205 0.0 0.0 0 0 ? S< Jul06 0:00 [kondemand/1]
root 2217 0.0 0.0 10760 380 ? Ss Jul06 0:03 irqbalance
rpc 2316 0.0 0.0 8052 580 ? Ss Jul06 0:00 portmap
root 2342 0.0 0.0 0 0 ? S< Jul06 0:00 [rpciod/0]
root 2343 0.0 0.0 0 0 ? S< Jul06 0:00 [rpciod/1]
root 2351 0.0 0.0 14360 888 ? Ss Jul06 0:00 rpc.statd
root 2376 0.0 0.0 55180 768 ? Ss Jul06 0:00 rpc.idmapd
dbus 2391 0.0 0.0 21356 1040 ? Ss Jul06 0:01 dbus-daemon --system
root 2410 0.0 0.0 10432 788 ? Ss Jul06 0:00 /usr/sbin/hcid
root 2416 0.0 0.0 5936 548 ? Ss Jul06 0:00 /usr/sbin/sdpd
root 2441 0.0 0.0 0 0 ? S< Jul06 0:00 [krfcommd]
root 2477 0.0 0.0 31416 1376 ? Ssl Jul06 0:00 pcscd
root 2497 0.0 0.0 3800 580 ? Ss Jul06 0:00 /usr/sbin/acpid
68 2522 0.0 0.0 30848 3908 ? Ss Jul06 0:01 hald
root 2523 0.0 0.0 21692 1052 ? S Jul06 0:00 hald-runner
68 2530 0.0 0.0 12324 844 ? S Jul06 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root 2556 0.0 0.0 8516 488 ? Ss Jul06 0:00 /usr/bin/hidd --server
root 2581 0.0 0.0 121988 1536 ? Ssl Jul06 0:00 automount
root 2598 0.0 0.0 26324 532 ? Ss Jul06 0:00 ./hpiod
root 2603 0.0 0.1 155132 6720 ? S Jul06 0:00 python ./hpssd.py
root 2616 0.0 0.0 62624 1220 ? Ss Jul06 0:02 /usr/sbin/sshd
root 2625 0.0 0.0 135484 2676 ? Ss Jul06 0:00 cupsd
root 2639 0.0 0.0 21644 892 ? Ss Jul06 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
ntp 2651 0.0 0.1 23388 5028 ? SLs Jul06 0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 2714 0.0 0.0 73172 2372 ? Ss Jul06 0:01 sendmail: accepting connections
smmsp 2722 0.0 0.0 59756 1788 ? Ss Jul06 0:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root 2732 0.0 0.0 6452 376 ? Ss Jul06 0:00 gpm -m /dev/input/mice -t exps2
root 2741 0.0 0.0 74812 1156 ? Ss Jul06 0:00 crond
xfs 2770 0.0 0.0 20964 1764 ? Ss Jul06 0:00 xfs -droppriv -daemon
root 2890 0.0 0.0 19624 464 ? Ss Jul06 0:00 /usr/sbin/atd
avahi 2917 0.0 0.0 23272 1312 ? Ss Jul06 0:00 avahi-daemon: running [qgserver2.local]
avahi 2918 0.0 0.0 23148 340 ? Ss Jul06 0:00 avahi-daemon: chroot helper
root 2949 0.0 0.1 150348 5316 ? Ssl Jul06 0:00 /usr/sbin/cimserver
root 3023 0.0 0.0 119520 2008 ? S<sl Jul06 5:50 modclusterd
root 3036 0.0 0.1 23532 5128 ? S<Ls Jul06 0:00 clurgmgrd
root 3037 0.0 0.0 23532 460 ? S< Jul06 0:00 clurgmgrd
root 3047 0.0 0.0 18416 476 ? S Jul06 0:00 /usr/sbin/smartd -q never
root 3052 0.0 0.0 3792 484 tty1 Ss+ Jul06 0:00 /sbin/mingetty tty1
root 3053 0.0 0.0 3792 484 tty2 Ss+ Jul06 0:00 /sbin/mingetty tty2
root 3054 0.0 0.0 3792 484 tty3 Ss+ Jul06 0:00 /sbin/mingetty tty3
root 3055 0.0 0.0 3792 480 tty4 Ss+ Jul06 0:00 /sbin/mingetty tty4
root 3060 0.0 0.0 3792 480 tty5 Ss+ Jul06 0:00 /sbin/mingetty tty5
root 3066 0.0 0.0 3792 480 tty6 Ss+ Jul06 0:00 /sbin/mingetty tty6
root 3068 0.0 0.0 167608 2580 ? Ss Jul06 0:00 /usr/sbin/gdm-binary -nodaemon
root 3155 0.0 0.0 194720 2348 ? S Jul06 0:00 /usr/sbin/gdm-binary -nodaemon
root 3157 0.0 0.1 189844 4100 ? Sl Jul06 0:00 /usr/libexec/gdm-rh-security-token-helper
root 3158 0.0 0.1 92700 6392 tty7 Ss+ Jul06 0:10 /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
root 3173 0.0 0.0 13040 1168 ? SN Jul06 0:00 /usr/libexec/gam_server
gdm 3185 0.0 0.4 221788 16740 ? Ss Jul06 0:00 /usr/libexec/gdmgreeter
root 14264 0.0 0.0 0 0 ? S< Jul06 0:00 [iscsi_eh]
root 14289 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_46]
root 14290 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_wq_46]
root 14304 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_47]
root 14306 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_wq_47]
root 14323 0.0 0.0 0 0 ? S< Jul06 0:00 [ib_addr]
root 14333 0.0 0.0 0 0 ? S< Jul06 0:00 [ib_mcast]
root 14334 0.0 0.0 0 0 ? S< Jul06 0:00 [ib_inform]
root 14335 0.0 0.0 0 0 ? S< Jul06 0:00 [local_sa]
root 14339 0.0 0.0 0 0 ? S< Jul06 0:00 [iw_cm_wq]
root 14343 0.0 0.0 0 0 ? S< Jul06 0:00 [ib_cm/0]
root 14344 0.0 0.0 0 0 ? S< Jul06 0:00 [ib_cm/1]
root 14348 0.0 0.0 0 0 ? S< Jul06 0:00 [rdma_cm]
root 14366 0.0 0.0 59316 764 ? Ssl Jul06 0:00 brcm_iscsiuio
root 14372 0.0 0.0 3984 496 ? Ss Jul06 0:00 iscsid
root 14373 0.0 0.1 8396 4324 ? S<Ls Jul06 0:00 iscsid
root 14377 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_48]
root 14378 0.0 0.0 0 0 ? S< Jul06 0:00 [iscsi_q_48]
root 14379 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_wq_48]
root 14381 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_eh_49]
root 14382 0.0 0.0 0 0 ? S< Jul06 0:00 [iscsi_q_49]
root 14383 0.0 0.0 0 0 ? S< Jul06 0:00 [scsi_wq_49]
root 15316 0.0 0.0 0 0 ? S Jul07 0:01 [pdflush]
root 18496 0.0 0.3 258944 15756 ? SN 10:18 0:00 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 18705 0.0 0.0 90124 3252 ? Ss 10:27 0:00 sshd: nate [priv]
nate 18710 0.0 0.0 90124 1756 ? S 10:27 0:00 sshd: nate@pts/1
nate 18711 0.0 0.0 66064 1524 pts/1 Ss 10:27 0:00 -bash
root 18737 0.0 0.0 101056 1392 pts/1 S 10:27 0:00 su -
root 18740 0.0 0.0 66064 1592 pts/1 S 10:27 0:00 -bash
root 19532 0.0 0.0 65592 960 pts/1 R+ 10:49 0:00 ps aux
postgres 27432 0.0 0.0 120688 3332 ? S Jul15 0:09 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql
postgres 27434 0.0 0.0 109864 760 ? S Jul15 0:00 postgres: logger process
postgres 27436 0.0 0.2 120800 9424 ? S Jul15 0:00 postgres: writer process
postgres 27437 0.0 0.0 110868 1704 ? S Jul15 0:01 postgres: stats buffer process
postgres 27438 0.0 0.0 110056 880 ? S Jul15 0:00 postgres: stats collector process
root 29414 0.0 0.1 220548 7772 ? Ss Jul19 0:00 /usr/sbin/httpd.worker
apache 29416 0.5 3.2 731040 127320 ? Sl Jul19 5:47 /usr/sbin/httpd.worker
apache 29417 0.4 2.8 734092 111852 ? Sl Jul19 5:16 /usr/sbin/httpd.worker
[/code]
Any help getting to the root of this issue would be greatly appreciated.

[Moderator edited to insert [i]code[/i] tags to preserve the formatting and aid readability.]

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: Trouble tracking down a memory problem

Post by pschaff » 2010/07/20 15:23:22

Have you tried running "top" and monitoring "%MEM" to see what is eating memory?

natediggs
Posts: 27
Joined: 2008/01/26 18:03:32

Re: Trouble tracking down a memory problem

Post by natediggs » 2010/07/20 20:39:10

Hi Phil,

Thanks for the reply though I feel it was a bit snarky.

Yes, top and ps are the first places I would check to look at what memory this system is using. Top is not giving me any more useful info than ps is.

I posted this here because I'm having a hard time tracking down what was using all of the memory and hoping someone could provide some places to start beyond PS or top. I am familiar with CentOS and how to administer it. I have 10 CentOS servers that I manage with a variety of applications on them. This particular server is running slow and reporting a very high percentage of memory usage when it doesn't appear that anything is using up memory at the rates the server claims.

Nate

natediggs
Posts: 27
Joined: 2008/01/26 18:03:32

Re: Trouble tracking down a memory problem

Post by natediggs » 2010/07/20 22:11:21

To post a little more info. If I run TOP and sort by memory I get this:[code]
6610 apache 25 0 700m 84m 5784 S 12.0 2.2 0:10.71 httpd.worker
6611 apache 25 0 621m 69m 5840 S 0.0 1.8 0:06.37 httpd.worker
3185 gdm 15 0 216m 16m 7164 S 0.0 0.4 0:00.35 gdmgreeter
18496 root 34 19 253m 15m 2188 S 0.0 0.4 0:00.14 yum-updatesd
27436 postgres 15 0 117m 9428 8908 S 0.0 0.2 0:00.40 postmaster
6608 root 18 0 214m 7776 3632 S 0.0 0.2 0:00.00 httpd.worker
2603 root 18 0 151m 6720 960 S 0.0 0.2 0:00.06 python
3158 root 15 0 106m 6392 4344 S 0.0 0.2 0:10.97 Xorg
31501 root 15 0 146m 5316 3964 S 0.0 0.1 0:00.00 cimserver
3036 root 24 -1 23532 5128 4432 S 0.0 0.1 0:00.00 clurgmgrd
2651 ntp 15 0 23388 5028 3904 S 0.0 0.1 0:00.22 ntpd
3157 root 18 0 185m 4100 3364 S 0.0 0.1 0:00.28 gdm-rh-security
2522 haldaemo 15 0 30848 3908 1564 S 0.0 0.1 0:01.86 hald
27432 postgres 15 0 117m 3332 2848 S 0.0 0.1 0:11.51 postmaster
31126 root 16 0 90124 3284 2544 S 0.0 0.1 0:00.02 sshd
5245 root 16 0 90124 3272 2552 S 0.0 0.1 0:00.01 sshd
2625 root 18 0 132m 2676 1764 S 0.0 0.1 0:00.00 cupsd
3068 root 15 0 163m 2580 1964 S 0.0 0.1 0:00.01 gdm-binary
2714 root 15 0 73172 2372 808 S 0.0 0.1 0:01.07 sendmail
3155 root 17 0 190m 2348 1636 S 0.0 0.1 0:00.00 gdm-binary
31129 nate 15 0 90124 1928 1172 S 0.0 0.0 0:01.20 sshd
5247 ben 15 0 90124 1812 1076 S 0.0 0.0 0:00.06 sshd
2722 smmsp 18 0 59756 1788 636 S 0.0 0.0 0:00.00 sendmail
2770 xfs 15 0 20964 1764 760 S 0.0 0.0 0:00.00 xfs
27437 postgres 15 0 108m 1704 216 S 0.0 0.0 0:01.82 postmaster
31130 nate 15 0 66196 1680 1192 S 0.0 0.0 0:00.05 bash
31327 root 15 0 66196 1632 1188 S 0.0 0.0 0:00.10 bash
5248 ben 15 0 66064 1556 1164 S 0.0 0.0 0:00.01 bash
2581 root 21 0 119m 1536 1124 S 0.0 0.0 0:00.06 automount
31325 root 16 0 98.7m 1380 1080 S 0.0 0.0 0:00.00 su
2477 root 18 0 31416 1376 564 S 0.0 0.0 0:00.97 pcscd
2917 avahi 15 0 23272 1312 1060 S 0.0 0.0 0:00.15 avahi-daemon
2391 dbus 15 0 21504 1248 804 S 0.0 0.0 0:01.93 dbus-daemon
2616 root 15 0 62624 1220 656 S 0.0 0.0 0:02.49 sshd
542 root 21 -4 13032 1208 388 S 0.0 0.0 0:00.84 udevd
3173 root 34 19 13040 1172 944 S 0.0 0.0 0:00.18 gam_server
2741 root 15 0 74812 1156 580 S 0.0 0.0 0:00.04 crond
6867 root 15 0 12740 1092 804 R 0.0 0.0 0:00.01 top
2523 root 15 0 21692 1052 868 S 0.0 0.0 0:00.00 hald-runner
2139 root 7 -8 81800 892 616 S 0.0 0.0 0:00.54 audispd
2639 root 18 0 21644 892 680 S 0.0 0.0 0:00.00 xinetd
[/code]

Which makes it pretty clear that Apache and Postgres aren't the culprits.

I shut down a bunch of unnecessary processes (FreeNX, iscsid which was connected to an iSCSI target no longer in use) and the free memory got up to about 250MBs. Then, over the last couple of hours it has steadily gone back down to around 177MBs free.
[Moderator edit: Added [i]code[/i] tags to preserve formatting.]

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: Trouble tracking down a memory problem

Post by pschaff » 2010/07/20 23:14:34

Are you sure there is a real problem and not just the perception of a problem? Unused memory is wasted memory and the kernel memory manager will generally make the best use of available memory. Unless processes are being killed due to lack of memory, or swap is being heavily used, you are probably OK.

natediggs
Posts: 27
Joined: 2008/01/26 18:03:32

Re: Trouble tracking down a memory problem

Post by natediggs » 2010/07/21 01:10:31

When server memory gets very low then the application hosted there becomes very slow, and unstable and has become completely unusable at times.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: Trouble tracking down a memory problem

Post by pschaff » 2010/07/21 14:09:39

Definitely a problem - so, is swap being used?

gerald_clark
Posts: 10642
Joined: 2005/08/05 15:19:54
Location: Northern Illinois, USA

Trouble tracking down a memory problem

Post by gerald_clark » 2010/07/21 14:18:31

The memory usage shown in the first post is only 515M.
The rest is cache.

The usage when slow would be more informative.

Post Reply

Return to “CentOS 5 - Webhosting Support”