CentOS 5/6 - random temporary freezes

zinok · Post by **zinok** » 2014/04/15 22:50:15

I am working on high load daemon development, which listens on UDP and processes packets. Last few months I noticed some strange issue when it takes 500-700 ms to answer packet, while usually it takes 20 ms. I've run strace on all daemon processes and found this thing:

13:35:36.979887 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13:35:36.979944 write(3, "[2014-04-15 13:35:36,979] WARNING WORKER 26 - [pkt#105132/AUTH] loadPresets - memory used: 0 kb\n", 99) = 99
13:35:37.599793 sendto(10, "Q\0\0\0\rSELECT 1\0", 14, MSG_NOSIGNAL, NULL, 0) = 14
13:35:37.599865 poll([{fd=10, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=10, revents=POLLIN}])

You can see that between write and sendto passed around 600 ms. At this time server was not overloaded (LA = 0.4, 16 cores). There were free memory. There were no load on disks.
So I took straces of other daemon processes plus database processes. And then did:

grep '13:35:37.4' *
grep '13:35:37.3' *
grep '13:35:37.2' *

None of commans show any lines. So I guess the system was just doing nothing during this 600 ms.

Is there any way to diagnose this issue? What it might be?

Kernel 2.6.18-194.11.1.el5

drk · Post by **drk** » 2014/04/16 00:59:11

zinok wrote:Kernel 2.6.18-194.11.1.el5

Start by updating your system

zinok · Post by **zinok** » 2014/04/16 06:27:44

Other server with the same problem has 2.6.32-431.el6.x86_64

drk · Post by **drk** » 2014/04/16 06:44:09

cpuspeed activated? NTP adjusting the time and there isn't an actual freeze?

avij · Post by **avij** » 2014/04/16 08:28:51

I think it is your 'write' that is taking longer than expected. Where is it writing to? To a local log file or to a network socket? Running strace with the -T option would show how long it takes to execute each system call.

zinok · Post by **zinok** » 2014/04/16 14:01:31

Write is writing to just a local file. Unfortunately the problem is not only on writes. Here is two more examples today:

13:08:12.663623 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13:08:12.926411 write(3, "[2014-04-15 13:08:12,663] WARNING WORKER 80 - [pkt#3395853/ACCT-START] Time between ...

300 ms on just stat /etc/localtime

and
13:08:13.347081 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13:08:13.530962 poll([{fd=18, events=POLLOUT}], 1, 3000) = 1 ([{fd=18, revents=POLLOUT}])

200 ms more...

It is not NTP time adjustment as seen on two separate servers and measured by tcpdump at both of them.

avij · Post by **avij** » 2014/04/16 14:12:44

Or alternatively, your program does something else between those two system calls.. Use the -T option for strace to see how much each system call takes. Is your application multithreaded? If yes, how do you differentiate the system calls running in separate threads in your strace output?

zinok · Post by **zinok** » 2014/04/16 14:19:16

Will add -T. As for multithreaded - no, it is multiprocess daemon. Separate straces recorded for each process. There are two cases - most of the time other processes continue to work during this "freeze" at one of processes. Sometimes (seen 2-3 times) - none of processes work during freeze.

drk · Post by **drk** » 2014/04/16 18:35:15

zinok wrote:It is not NTP time adjustment as seen on two separate servers and measured by tcpdump at both of them.

Both servers are the same make/model hardware?

zinok · Post by **zinok** » 2014/04/16 20:32:40

drk wrote:
zinok wrote:It is not NTP time adjustment as seen on two separate servers and measured by tcpdump at both of them.
Both servers are the same make/model hardware?

Nope, totally different.

CentOS

CentOS 5/6 - random temporary freezes

CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes

Re: CentOS 5/6 - random temporary freezes