Six months later and I had to reboot the machine. It had been accumulating yum updates the whole time. It looks like it still has a similar problem. This time NFS and YP services don't start. It takes about 30s to login as "root" via ssh or on the console, and it emits this line before the prompt
/var/log/messages accumulates these messages, which are similar too, but not quite the same as before:
Code: Select all
Dec 12 16:27:09 server systemd-logind: Failed to add match for NameOwnerChanged: Connection timed out
Dec 12 16:27:09 server systemd-logind: Failed to fully start up daemon: Connection timed out
Dec 12 16:27:09 server systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
Dec 12 16:27:09 server systemd: Failed to start Login Service.
Dec 12 16:27:09 server systemd: Unit systemd-logind.service entered failed state.
Dec 12 16:27:09 server systemd: systemd-logind.service failed.
Dec 12 16:27:09 server systemd: systemd-logind.service has no holdoff time, scheduling restart.
Dec 12 16:27:09 server systemd: Stopped Login Service.
Dec 12 16:27:09 server systemd: Starting Login Service...
polkitd is running
Code: Select all
ps -ef | grep polkit
polkitd 4353 1 0 16:19 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
polkit was rinstalled with yum, and the two scripts here
https://access.redhat.com/solutions/1543343 run successfully. On reboot it came up exactly the same.
The crux of the problem seems to be that rpcinfo -p hangs, and if I waited long enough it would probably time out. That would certainly explain why why NFS and YP do not work. The system was completely unusable until the /etc/nsswitch.conf hosts line was changed from "files nis dns" to "files dns nis". The thing is, rpcbind did start:
Code: Select all
nmap localhost
Starting Nmap 6.40 ( http://nmap.org ) at 2018-12-12 16:47 PST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000019s latency).
Other addresses for localhost (not scanned): 127.0.0.1
Not shown: 990 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
111/tcp open rpcbind
465/tcp open smtps
514/tcp open shell
587/tcp open submission
631/tcp open ipp
705/tcp open agentx
4003/tcp open pxc-splr-ft
19283/tcp open keysrvr
So rpcbind is running, but it won't talk to rpcinfo for some reason. Turning off the firewall did not help. Obviously it used to work, and I have not touched any of the configuration files, although yum has been very busy in the preceding 6 months. I keep a copy of /etc in a subdirectory of /root, and there are now 291 changes between the old copy and what /etc now is. The two network interfaces have fixed addresses in their network-scripts files. Yet there is some problem, who knows what, when the network starts:
Code: Select all
Dec 12 16:19:43 server systemd: NetworkManager-wait-online.service: main process exited, code=exited, status=1/FAILURE
Dec 12 16:19:43 server systemd: Failed to start Network Manager Wait Online.
Dec 12 16:19:43 server systemd: Unit NetworkManager-wait-online.service entered failed state.
Dec 12 16:19:43 server systemd: NetworkManager-wait-online.service failed.
Dec 12 16:19:43 server systemd: Starting LSB: Bring up/down networking...
Dec 12 16:19:44 server network: Bringing up loopback interface: [ OK ]
Dec 12 16:19:44 server network: Bringing up interface Profile_1: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device does not seem to be present, delaying initialization.
Dec 12 16:19:44 server /etc/sysconfig/network-scripts/ifup-eth: Device does not seem to be present, delaying initialization.
Dec 12 16:19:44 server network: [FAILED]
Dec 12 16:19:44 server kernel: bnx2 0000:02:00.0 em1: using MSIX
Dec 12 16:19:44 server kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
Dec 12 16:19:47 server kernel: bnx2 0000:02:00.0 em1: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 12 16:19:47 server kernel:
Dec 12 16:19:47 server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
Dec 12 16:19:48 server avahi-daemon[4364]: Joining mDNS multicast group on interface em1.IPv4 with address 131.215.12.42.
Dec 12 16:19:48 server avahi-daemon[4364]: New relevant interface em1.IPv4 for mDNS.
Dec 12 16:19:48 server avahi-daemon[4364]: Registering new address record for 131.215.12.42 on em1.IPv4.
Dec 12 16:19:48 server network: Bringing up interface em1: [ OK ]
Dec 12 16:19:48 server kernel: bnx2 0000:02:00.1 em2: using MSIX
Dec 12 16:19:48 server kernel: IPv6: ADDRCONF(NETDEV_UP): em2: link is not ready
Dec 12 16:19:49 server avahi-daemon[4364]: Registering new address record for fe80::862b:2bff:fe08:49da on em1.*.
Dec 12 16:19:51 server kernel: bnx2 0000:02:00.1 em2: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 12 16:19:51 server kernel: , receive & transmit flow control ON
Dec 12 16:19:51 server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em2: link becomes ready
Dec 12 16:19:52 server avahi-daemon[4364]: Joining mDNS multicast group on interface em2.IPv4 with address 192.168.1.220.
Dec 12 16:19:52 server avahi-daemon[4364]: New relevant interface em2.IPv4 for mDNS.
Dec 12 16:19:52 server avahi-daemon[4364]: Registering new address record for 192.168.1.220 on em2.IPv4.
Dec 12 16:19:52 server network: Bringing up interface em2: [ OK ]
Dec 12 16:19:52 server systemd: network.service: control process exited, code=exited status=1
Dec 12 16:19:52 server systemd: Failed to start LSB: Bring up/down networking.
Dec 12 16:19:52 server systemd: Unit network.service entered failed state.
Dec 12 16:19:52 server systemd: network.service failed.
Dec 12 16:19:52 server systemd: Reached target Network.
The last two lines of the preceding seem to say exactly opposite things! The network seems to work, at least at the transport levels, but rpc is not working right. Versions of various possibly relevant packages:
Code: Select all
systemd-219-62.el7.x86_64
systemd-libs-219-62.el7.x86_64
systemd-sysv-219-62.el7.x86_64
systemd-libs-219-62.el7.i686
systemd-python-219-62.el7.x86_64
polkit-0.112-18.el7.x86_64
polkit-gnome-0.106-0.1.20170423gita0763a2.el7.x86_64
polkit-pkla-compat-0.1-4.el7.x86_64
rpcbind-0.2.0-38.el7.x86_64
kernel-3.10.0-957.1.3.el7.x86_64
cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
The last time it booted, 6 months ago, it was using kernel 3.10.0-862.6.3.el7.x86_64.
Remember, I didn't change anything myself, and rpc, NFS, and YP were all working before the reboot.
Any ideas???
Thanks.