system clock looping loosely over 4-5 seconds

If it doesn't fit in another category, ask it here.
jptxs
Posts: 6
Joined: 2006/11/04 15:39:14

system clock looping loosely over 4-5 seconds

Post by jptxs » 2006/11/04 16:11:03

I have a really odd problem. My clock on a box named bach loops over and over, advancing only 1-2 seconds for every 10-20. My prompt, which displays the time, shows the issue nicely:

08:02:18 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:19 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:22 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:19 jptxs@bach:~$
08:02:19 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:22 jptxs@bach:~$
08:02:22 jptxs@bach:~$
08:02:18 jptxs@bach:~$

Now, eventually, it will get to a point where it will move on a second. It would advance to 08:02:23 or maybe 24. But then it will skip back to 19 or 20 and start to loop for a bit again. I wrote a script to track it:

#!/bin/bash

while true
do
echo "bach clock:" >> clockChecks.log
date >> clockChecks.log
echo >> clockChecks.log
echo "wagner clock:" >> clockChecks.log
ssh wagner date >> clockChecks.log
echo >> clockChecks.log
echo ++++++---------------------++++++++ >> clockChecks.log
echo >> clockChecks.log
sleep 10
done

The output compares it to a working box named wagner. I have attached a file [clockChecks.log] that shows the output. Now, what you can see is that they are in sync for quite a while. Then the clock on bach goes out of sync right around when wagner is shutdown for the night (I forgot I was running this test):

bach clock:
Fri Nov 3 23:12:05 EST 2006
wagner clock:
Fri Nov 3 23:12:05 EST 2006
++++++---------------------++++++++

bach clock:
Fri Nov 3 23:12:16 EST 2006
wagner clock:
++++++---------------------++++++++

bach clock:
Fri Nov 3 23:12:27 EST 2006
wagner clock:
++++++---------------------++++++++

bach clock:
Fri Nov 3 23:13:25 EST 2006
wagner clock:
++++++---------------------++++++++

Of course, you could say it may have been due to the wait time for the ssh timeout, but then you see the difference big time once wagner is back up later:

bach clock:
Sat Nov 4 02:27:27 EST 2006
wagner clock:
Sat Nov 4 07:36:26 EST 2006
++++++---------------------++++++++

bach clock:
Sat Nov 4 02:27:42 EST 2006
wagner clock:
Sat Nov 4 07:39:27 EST 2006
+++++---------------------++++++++

bach clock:
Sat Nov 4 02:27:53 EST 2006
wagner clock:
Sat Nov 4 07:42:41 EST 2006

I ran ntpdate and the clock was reset, but immediately slowed down and looped again.

This gets stranger. This all started when I upgraded from a an old Debian install running kernel 2.20 with custom config to Ubuntu 6.06. For various reasons I needed a 2.6 kernel on the box. The box crawled. At that point, I didn't notice the clock issues. I installed over and over. I tried to go back to Debian Sarge. I tried a RedHat AS 3 set of disks I had and that wouldn't boot correctly (seemed like the init scripts were timing out). I finally settled on CentOS 4.4 (Server Disk for the install). That has gotten me to the point where I am now. It seemed to be running fine for 7-8 hours. I reinstalled the services I had had on the box (pdnsd: http://www.phys.uu.nl/~rombouts/pdnsd.html, ejabberd: http://ejabberd.jabber.ru/ and VMWare Server: http://www.vmware.com/products/server/ running an OpenBSD guest OS). Everything seemed fine, but then the clock started going funny again after a few hours more operations. So I searched and searched. I found a ton of stuff about XBox clock skips and loops, but nothing I could find seemed relevant to me. I upgraded my BIOS (http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=lenovo&lndocid=MIGR-42952). After that reboot things seemed to be fine again for a bit. That's when I wrote the script to track it. I thought I'd licked it. Bu then the above unfolded and now I'm here begging for help.

Could this be hardware? Seem too inconsistent for that, in my mind.

How, if at all, could this be somehow be a relationship to the wagner box? It seemed to go out of whack when that went offline. But it was a 100% fresh install in a few cases and still had this issue. I didn't see any errors in /var/log/messages or other logs at that time, either.

Could this be some odd thing I'm just not seeing?

Any suggestions for search terms? I've tried "clock skip" "clock loop" and all sorts of system indicators (kernel: Linux bach 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686 i686 i386 GNU/Linux, model: info in dmidecode output attached in zip file).

Anything I'm just plain missing?

Any help appreciated...


jptxs

Lenard
Posts: 2283
Joined: 2005/11/29 02:35:25
Location: Indiana

Re: system clock looping loosely over 4-5 seconds

Post by Lenard » 2006/11/04 18:23:09

Maybe the CMOS battery is weak/bad/old, try replacing it if possible.

jptxs
Posts: 6
Joined: 2006/11/04 15:39:14

Re: system clock looping loosely over 4-5 seconds

Post by jptxs » 2006/11/05 02:23:08

just replaced it. Again, things seemed fine for a bit after reboot and then:

bach clock:
Sat Nov 4 21:06:37 EST 2006
wagner clock:
Sat Nov 4 21:06:37 EST 2006
++++++---------------------++++++++

bach clock:
Sat Nov 4 21:06:51 EST 2006
wagner clock:
Sat Nov 4 21:07:52 EST 2006
++++++---------------------++++++++

bach clock:
Sat Nov 4 21:07:02 EST 2006
wagner clock:
Sat Nov 4 21:11:06 EST 2006
++++++---------------------++++++++

bach clock:
Sat Nov 4 21:07:13 EST 2006
wagner clock:
Sat Nov 4 21:14:20 EST 2006
++++++---------------------++++++++

bach clock:
Sat Nov 4 21:07:24 EST 2006
wagner clock:
Sat Nov 4 21:17:34 EST 2006


Are there logs outside /var/log I should be checking?

Anything running look odd:

[root@bach log]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 19:08 ? 00:00:00 init [3]
root 2 1 0 19:08 ? 00:00:00 [ksoftirqd/0]
root 3 1 0 19:08 ? 00:00:00 [events/0]
root 4 3 0 19:08 ? 00:00:00 [khelper]
root 5 3 0 19:08 ? 00:00:00 [kacpid]
root 21 3 0 19:08 ? 00:00:00 [kblockd/0]
root 39 3 0 19:08 ? 00:00:00 [pdflush]
root 40 3 0 19:08 ? 00:00:00 [pdflush]
root 42 3 0 19:08 ? 00:00:00 [aio/0]
root 22 1 0 19:08 ? 00:00:00 [khubd]
root 41 1 0 19:08 ? 00:00:00 [kswapd0]
root 188 1 0 19:08 ? 00:00:00 [kseriod]
root 297 1 0 19:08 ? 00:00:00 [kjournald]
root 1320 1 0 19:09 ? 00:00:00 udevd
root 1712 3 0 19:09 ? 00:00:00 [kauditd]
root 1752 3 0 19:09 ? 00:00:00 [kmirrord]
root 1987 1 0 19:10 ? 00:00:00 [kjournald]
root 1988 1 0 19:10 ? 00:00:00 [kjournald]
root 2534 1 0 19:10 ? 00:00:00 syslogd -m 0
root 2538 1 0 19:10 ? 00:00:00 klogd -x
rpc 2556 1 0 19:10 ? 00:00:00 portmap
root 2575 1 0 19:10 ? 00:00:00 rpc.statd
root 2601 1 0 19:10 ? 00:00:00 rpc.idmapd
root 2662 1 0 19:10 ? 00:00:00 /usr/sbin/smartd
root 2672 1 0 19:10 ? 00:00:00 /usr/sbin/acpid
root 2710 1 0 19:10 ? 00:00:00 /usr/sbin/sshd
root 2723 1 0 19:10 ? 00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root 2732 1 0 19:10 ? 00:00:00 crond
root 2749 1 0 19:10 ? 00:00:00 /usr/sbin/atd
dbus 2758 1 0 19:10 ? 00:00:00 dbus-daemon-1 --system
root 2767 1 0 19:10 ? 00:00:00 hald
root 2900 1 0 19:10 ? 00:00:00 /usr/bin/vmnet-bridge -d /var/run/vmnet-bridge-0.pid /dev/vmnet0 eth0
root 2914 1 0 19:10 ? 00:00:00 /usr/bin/vmnet-natd -d /var/run/vmnet-natd-8.pid -m /var/run/vmnet-natd-8.mac -c /etc/vmware/vmnet8/nat/nat.conf
root 2920 1 0 19:10 ? 00:00:09 /usr/sbin/vmware-serverd -s -d
root 2924 1 0 19:10 tty1 00:00:00 /sbin/mingetty tty1
root 2925 1 0 19:10 tty2 00:00:00 /sbin/mingetty tty2
root 2926 1 0 19:10 tty3 00:00:00 /sbin/mingetty tty3
root 2927 1 0 19:10 tty4 00:00:00 /sbin/mingetty tty4
root 2928 1 0 19:10 tty5 00:00:00 /sbin/mingetty tty5
root 2929 1 0 19:10 tty6 00:00:00 /sbin/mingetty tty6
root 3466 1 0 19:10 ? 00:01:01 /usr/lib/vmware/bin/vmware-vmx -C /home/vmware/VMs/OpenBSD38/OpenBSD38.vmx -@ ""
root 3495 2710 0 19:10 ? 00:00:00 sshd: jptxs [priv]
root 3501 1 0 19:10 ? 00:00:00 /usr/bin/vmnet-netifup -d /var/run/vmnet-netifup-vmnet8.pid /dev/vmnet8 vmnet8
root 3512 1 0 19:10 ? 00:00:00 /usr/bin/vmnet-netifup -d /var/run/vmnet-netifup-vmnet1.pid /dev/vmnet1 vmnet1
root 3522 1 0 19:10 ? 00:00:00 /usr/bin/vmnet-dhcpd -cf /etc/vmware/vmnet8/dhcpd/dhcpd.conf -lf /etc/vmware/vmnet8/dhcpd/dhcpd.leases -pf /var/run/vmnet-dhcpd
root 3524 1 0 19:10 ? 00:00:00 /usr/bin/vmnet-dhcpd -cf /etc/vmware/vmnet1/dhcpd/dhcpd.conf -lf /etc/vmware/vmnet1/dhcpd/dhcpd.leases -pf /var/run/vmnet-dhcpd
jptxs 3525 3495 0 19:10 ? 00:00:00 sshd: jptxs@pts/0
jptxs 3526 3525 0 19:10 pts/0 00:00:00 -bash
jptxs 3626 3526 0 19:11 pts/0 00:00:01 /bin/bash ./clocks
nobody 3682 1 0 19:13 ? 00:00:00 /usr/local/sbin/pdnsd -d -s -p /var/run/pdnsd.pid
jptxs 3708 1 0 19:13 ? 00:00:00 /home/jptxs/ejabberd-1.0.0/bin/epmd -daemon
jptxs 3724 1 0 19:13 ? 00:00:00 /home/jptxs/ejabberd-1.0.0/bin/beam -- -root /home/jptxs/ejabberd-1.0.0 -progname /home/jptxs/ejabberd-1.0.0/bin/erl -- -home /
jptxs 3732 3724 0 19:13 ? 00:00:00 /home/jptxs/ejabberd-1.0.0/lib/ssl-3.0.10/priv/bin/linux-x86/ssl_esock
jptxs 3880 3526 0 19:21 pts/0 00:00:00 tail -f clockChecks.log
root 3899 2710 0 19:21 ? 00:00:00 sshd: jptxs [priv]
jptxs 3901 3899 0 19:21 ? 00:00:00 sshd: jptxs@pts/1
jptxs 3902 3901 0 19:21 pts/1 00:00:00 -bash
root 30231 3902 0 21:07 pts/1 00:00:00 su -
root 30232 30231 0 21:07 pts/1 00:00:00 -bash
jptxs 30280 3626 0 21:07 pts/0 00:00:00 sleep 10
root 30283 30232 0 21:07 pts/1 00:00:00 ps -ef

Lenard
Posts: 2283
Joined: 2005/11/29 02:35:25
Location: Indiana

Re: system clock looping loosely over 4-5 seconds

Post by Lenard » 2006/11/07 14:37:25

Sorry for the delay in getting back, I have not forgotten this issue. But I am at a loss as to why this continues to happen.

jptxs
Posts: 6
Joined: 2006/11/04 15:39:14

Re: system clock looping loosely over 4-5 seconds

Post by jptxs » 2006/11/07 15:10:32

Some more info. I've eliminated vmmon as a cause as I've had VMWare shut down for two days and it continues to happen. Looking at the RTC time shows interesting stuff as well:

++++++---------------------++++++++

harware clock says :: Tue 07 Nov 2006 09:17:28 AM EST -0.884165 seconds

bach clock:
Tue Nov 7 00:21:21 EST 2006

wagner clock:
Tue Nov 7 09:17:31 EST 2006

++++++---------------------++++++++

You can see that the results of hwclock are actually in sync with the good machine (wagner), while the system time being reported through date and the shell is still bad.

I have also found there seems to be no real tie in to time of day (though it has always happened after 9pm regardless of uptime) and that a reboot always seems to fix it for a while, though the amount of uptime that passes has varied from 4-7 hours.

Lenard
Posts: 2283
Joined: 2005/11/29 02:35:25
Location: Indiana

Re: system clock looping loosely over 4-5 seconds

Post by Lenard » 2006/11/07 21:48:02

Something is not right on the system, what I cannot say. But as an example (running my own little script);

The script named net-time (so you know the output sequence);

$ cat bin/net-time
#!/bin/bash
# Checking the time
hwclock
sudo /usr/sbin/ntpdate -u us.pool.ntp.org
clock
exit 0

The output;

$ net-time
Tue 07 Nov 2006 04:39:30 PM EST -0.183522 seconds
7 Nov 16:39:33 ntpdate[1969]: adjust time server 69.31.4.141 offset -0.075026 sec
Tue 07 Nov 2006 04:39:34 PM EST -0.790913 seconds

All my ntpd and time related files are the defaults as setup by Red Hat, for example my /etc/sysconfig/clock file;

# The ZONE parameter is only evaluated by system-config-date.
# The timezone of the system is defined by the contents of /etc/localtime.
ZONE="America/New_York"
UTC=false
ARC=false

Again the output from the script file after waiting a few minutes;

$ net-time
Tue 07 Nov 2006 04:49:44 PM EST -0.575893 seconds
7 Nov 16:49:46 ntpdate[2140]: adjust time server 207.188.193.83 offset -0.022214 sec
Tue 07 Nov 2006 04:49:47 PM EST -0.309282 seconds

jptxs
Posts: 6
Joined: 2006/11/04 15:39:14

Re: RESOLVED(?) system clock looping loosely over 4-5 seconds

Post by jptxs » 2006/11/13 20:35:21

So it looks like this is resolved. First of all, you can see some other threads on this at:

http://www.linuxquestions.org/questions/showthread.php?t=499578

and

http://www.linuxhelp.net/forums/index.php?showtopic=8438

I will post a link back to this post there. The clocks now seem to be in sync:

++++++---------------------++++++++
harware clock says :: Mon 13 Nov 2006 03:30:30 PM EST -0.835485 seconds

uptime is :: 15:30:30 up 2 days, 2:48, 4 users, load average: 0.18, 0.10, 0.09

bach clock:
Mon Nov 13 15:30:30 EST 2006

wagner clock:
Mon Nov 13 15:30:30 EST 2006
++++++---------------------++++++++

However, I'm not happy with the fix. I had to recompile my kernel. I don't mind doing it, but the whole reason I migrated to CentOS was to get away from constant tweaking I had to do in Debian that everyone assured me was due to the fact that it wasn't linked to a stable project like RedHat. So much for that. You can find both the old and new .config files for my kernel for comparison at pastebin:

old: http://pastebin.com/823567

new: http://pastebin.com/823568

It seems the default kernel had the processor type wrong and was optimized for a desktop install from an interrupt perspective. I installed what is called the "server" disk from the CentOS distribution, so that was very disappointing.

I'll post back if the problem reappears.

Thanks to those who helped.

Lenard
Posts: 2283
Joined: 2005/11/29 02:35:25
Location: Indiana

Re: RESOLVED(?) system clock looping loosely over 4-5 seconds

Post by Lenard » 2006/11/14 18:17:13

Looks like you got the better kernel installed, I have been wondering of late what is going on in the Linux world myself.

First one hears of the SCO attack on Linux along with other attacks on Linux and the GPL in general, the SuSE split (openSuSE and Enterprise versions), the Novell-Microsoft deal happens, the kernels for the new FC6 (and RHEL4 it seems) and badly built (an i586 as an i686 kernel and other such strangeness). Plus other unusual events with many of the other Distros and the Oracle Linux announcement. All "smells" like Linux is now truly and finally becoming a 'big business' environment with all the problems of big business unfortunately.

Anyhow, glad you got the problem resolved(hopefully).

jptxs
Posts: 6
Joined: 2006/11/04 15:39:14

Re: RESOLVED(?) system clock looping loosely over 4-5 seconds

Post by jptxs » 2007/01/01 13:52:02

Seems the links went dead. Here are new ones:

Old: http://pastebin.com/848942

New: http://pastebin.com/848945

HTH

divan
Posts: 1
Joined: 2007/02/06 10:08:49
Location: South Africa

Re: RESOLVED(?) system clock looping loosely over 4-5 seconds

Post by divan » 2007/02/06 10:17:53

Hi JPTXS,

I have the same problem on one of my Centos boxes, I tried to go to the link to check what you did with your config, but I can't see any info. Please let me know where else I can find it.

Thnx
Divan

Post Reply