Page 1 of 1

polkit conniptions

Posted: 2018/07/13 19:03:41
by mathog
Rebooted a Centos 7 system for the first time in 450ish days. It came up, but it is in an odd state.

Release: CentOS Linux release 7.5.1804 (Core)

Code: Select all

systemctl restart dhcpd.service
Error getting authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: GDBus.Error:org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.freedesktop.PolicyKit1': timed out (g-dbus-error-quark, 20)
Failed to restart dhcpd.service: Connection timed out
See system logs and 'systemctl status dhcpd.service' for details.
Googled and the consensus was that this is a polkit issue. It was not running so

Code: Select all

yum reinstall polkit
systemctl start polkit.service
same error as before. Great - polkit cannot start because polkit is not started!

Finally I started it manually (as root)

Code: Select all

/usr/lib/polkit-1/polkitd --no-debug &
That let systemctl restart dhcpd.service, however it seemed that polkit is still confused (or I am):

Code: Select all

 ps -ef | grep polkit
polkitd   4529  4175  0 11:24 pts/2    00:00:00 /usr/lib/polkit-1/polkitd --no-debug
systemctl restart polkit.service
systemctl status polkit.service
● polkit.service - Authorization Manager
   Loaded: loaded (/usr/lib/systemd/system/polkit.service; static; vendor preset: enabled)
   Active: inactive (dead) since Fri 2018-07-13 11:56:30 PDT; 53s ago
     Docs: man:polkit(8)
  Process: 5275 ExecStart=/usr/lib/polkit-1/polkitd --no-debug (code=exited, status=0/SUCCESS)
 Main PID: 5275 (code=exited, status=0/SUCCESS)

Jul 13 11:56:30 machinename systemd[1]: Starting Authorization Manager...
Jul 13 11:56:30 machinename  polkitd[5275]: Started polkitd version 0.112
Jul 13 11:56:30 machinename  polkitd[5275]: Loading rules from directory /etc/polkit-1/rules.d
Jul 13 11:56:30 machinename  polkitd[5275]: Loading rules from directory /usr/share/polkit-1/rules.d
Jul 13 11:56:30 machinename  polkitd[5275]: Finished loading, compiling and executing 8 rules
Jul 13 11:56:30 machinename  polkitd[5275]: Lost the name org.freedesktop.PolicyKit1 - exiting
Jul 13 11:56:30 machinename  systemd[1]: Started Authorization Manager.
 ps -ef | grep polkit
polkitd   4529  4175  0 11:24 pts/2    00:00:00 /usr/lib/polkit-1/polkitd --no-debug
It _says_ that it restarted it, but it just left the manually started one running.

Code: Select all

rpm -qa | grep -i polkit
polkit-gnome-0.106-0.1.20170423gita0763a2.el7.x86_64
polkit-pkla-compat-0.1-4.el7.x86_64
polkit-0.112-14.el7.x86_64
Any pointers on how to get out of this situation???

Thanks

Note added - there are polkitd entries in /etc/passwd and /etc/group respectively

Code: Select all

polkitd:x:997:996:User for polkitd:/:/sbin/nologin
polkitd:x:996:
This

https://access.redhat.com/solutions/1543343

says also to reset the ownerships and file permissions, which is what I think the yum reinstall did.

Re: polkit conniptions

Posted: 2018/07/13 22:52:57
by mathog
Also, once every minute these messages show up in /var/log/messages:

Code: Select all

Jul 13 15:51:03 machinename dbus[759]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Jul 13 15:51:03 machinename systemd-logind: Failed to enable subscription: Failed to activate service 'org.freedesktop.systemd1': timed out
Jul 13 15:51:03 machinename systemd-logind: Failed to fully start up daemon: Connection timed out
Jul 13 15:51:03 machinename systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
Jul 13 15:51:03 machinename systemd: Failed to start Login Service.
Jul 13 15:51:03 machinename systemd: Unit systemd-logind.service entered failed state.
Jul 13 15:51:03 machinename systemd: systemd-logind.service failed.
Jul 13 15:51:03 machinename systemd: systemd-logind.service has no holdoff time, scheduling restart.
Jul 13 15:51:03 machinename systemd: Starting Login Service...

Re: polkit conniptions

Posted: 2018/07/13 23:34:26
by mathog
This method seems to have fixed it. With the manually started polkit running:

Code: Select all

systemctl restart dbus.service
PolicyKit daemon disconnected from the bus.
We are no longer a registered authentication agent.
at this point there is no polkit process and there is (still) a dbus-daemon. However this one behaves differently, because it allows polkit to start normally.

Code: Select all

systemctl start polkit.service
No hiccups.

The clusters of messages in /var/log/messages noted in the previous post stopped.

That leaves one question - why did it fail on boot? Make that two questions, will it now fail on all boots?

Re: polkit conniptions

Posted: 2018/07/13 23:53:03
by TrevorH
That sounds to me more like you have a dbus problem...

Re: polkit conniptions

Posted: 2018/12/13 01:08:10
by mathog
Six months later and I had to reboot the machine. It had been accumulating yum updates the whole time. It looks like it still has a similar problem. This time NFS and YP services don't start. It takes about 30s to login as "root" via ssh or on the console, and it emits this line before the prompt

Code: Select all

'abrt-cli status' timed out
/var/log/messages accumulates these messages, which are similar too, but not quite the same as before:

Code: Select all

Dec 12 16:27:09 server systemd-logind: Failed to add match for NameOwnerChanged: Connection timed out
Dec 12 16:27:09 server systemd-logind: Failed to fully start up daemon: Connection timed out
Dec 12 16:27:09 server systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
Dec 12 16:27:09 server systemd: Failed to start Login Service.
Dec 12 16:27:09 server systemd: Unit systemd-logind.service entered failed state.
Dec 12 16:27:09 server systemd: systemd-logind.service failed.
Dec 12 16:27:09 server systemd: systemd-logind.service has no holdoff time, scheduling restart.
Dec 12 16:27:09 server systemd: Stopped Login Service.
Dec 12 16:27:09 server systemd: Starting Login Service...
polkitd is running

Code: Select all

ps -ef | grep polkit
polkitd   4353     1  0 16:19 ?        00:00:00 /usr/lib/polkit-1/polkitd --no-debug
polkit was rinstalled with yum, and the two scripts here https://access.redhat.com/solutions/1543343 run successfully. On reboot it came up exactly the same.

The crux of the problem seems to be that rpcinfo -p hangs, and if I waited long enough it would probably time out. That would certainly explain why why NFS and YP do not work. The system was completely unusable until the /etc/nsswitch.conf hosts line was changed from "files nis dns" to "files dns nis". The thing is, rpcbind did start:

Code: Select all

nmap localhost

Starting Nmap 6.40 ( http://nmap.org ) at 2018-12-12 16:47 PST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000019s latency).
Other addresses for localhost (not scanned): 127.0.0.1
Not shown: 990 closed ports
PORT      STATE SERVICE
22/tcp    open  ssh
25/tcp    open  smtp
111/tcp   open  rpcbind
465/tcp   open  smtps
514/tcp   open  shell
587/tcp   open  submission
631/tcp   open  ipp
705/tcp   open  agentx
4003/tcp  open  pxc-splr-ft
19283/tcp open  keysrvr

So rpcbind is running, but it won't talk to rpcinfo for some reason. Turning off the firewall did not help. Obviously it used to work, and I have not touched any of the configuration files, although yum has been very busy in the preceding 6 months. I keep a copy of /etc in a subdirectory of /root, and there are now 291 changes between the old copy and what /etc now is. The two network interfaces have fixed addresses in their network-scripts files. Yet there is some problem, who knows what, when the network starts:

Code: Select all

Dec 12 16:19:43 server systemd: NetworkManager-wait-online.service: main process exited, code=exited, status=1/FAILURE
Dec 12 16:19:43 server systemd: Failed to start Network Manager Wait Online.
Dec 12 16:19:43 server systemd: Unit NetworkManager-wait-online.service entered failed state.
Dec 12 16:19:43 server systemd: NetworkManager-wait-online.service failed.
Dec 12 16:19:43 server systemd: Starting LSB: Bring up/down networking...
Dec 12 16:19:44 server network: Bringing up loopback interface:  [  OK  ]
Dec 12 16:19:44 server network: Bringing up interface Profile_1:  ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device  does not seem to be present, delaying initialization.
Dec 12 16:19:44 server /etc/sysconfig/network-scripts/ifup-eth: Device  does not seem to be present, delaying initialization.
Dec 12 16:19:44 server network: [FAILED]
Dec 12 16:19:44 server kernel: bnx2 0000:02:00.0 em1: using MSIX
Dec 12 16:19:44 server kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
Dec 12 16:19:47 server kernel: bnx2 0000:02:00.0 em1: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 12 16:19:47 server kernel: 
Dec 12 16:19:47 server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
Dec 12 16:19:48 server avahi-daemon[4364]: Joining mDNS multicast group on interface em1.IPv4 with address 131.215.12.42.
Dec 12 16:19:48 server avahi-daemon[4364]: New relevant interface em1.IPv4 for mDNS.
Dec 12 16:19:48 server avahi-daemon[4364]: Registering new address record for 131.215.12.42 on em1.IPv4.
Dec 12 16:19:48 server network: Bringing up interface em1:  [  OK  ]
Dec 12 16:19:48 server kernel: bnx2 0000:02:00.1 em2: using MSIX
Dec 12 16:19:48 server kernel: IPv6: ADDRCONF(NETDEV_UP): em2: link is not ready
Dec 12 16:19:49 server avahi-daemon[4364]: Registering new address record for fe80::862b:2bff:fe08:49da on em1.*.
Dec 12 16:19:51 server kernel: bnx2 0000:02:00.1 em2: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 12 16:19:51 server kernel: , receive & transmit flow control ON
Dec 12 16:19:51 server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em2: link becomes ready
Dec 12 16:19:52 server avahi-daemon[4364]: Joining mDNS multicast group on interface em2.IPv4 with address 192.168.1.220.
Dec 12 16:19:52 server avahi-daemon[4364]: New relevant interface em2.IPv4 for mDNS.
Dec 12 16:19:52 server avahi-daemon[4364]: Registering new address record for 192.168.1.220 on em2.IPv4.
Dec 12 16:19:52 server network: Bringing up interface em2:  [  OK  ]
Dec 12 16:19:52 server systemd: network.service: control process exited, code=exited status=1
Dec 12 16:19:52 server systemd: Failed to start LSB: Bring up/down networking.
Dec 12 16:19:52 server systemd: Unit network.service entered failed state.
Dec 12 16:19:52 server systemd: network.service failed.
Dec 12 16:19:52 server systemd: Reached target Network.
The last two lines of the preceding seem to say exactly opposite things! The network seems to work, at least at the transport levels, but rpc is not working right. Versions of various possibly relevant packages:

Code: Select all

systemd-219-62.el7.x86_64
systemd-libs-219-62.el7.x86_64
systemd-sysv-219-62.el7.x86_64
systemd-libs-219-62.el7.i686
systemd-python-219-62.el7.x86_64
polkit-0.112-18.el7.x86_64
polkit-gnome-0.106-0.1.20170423gita0763a2.el7.x86_64
polkit-pkla-compat-0.1-4.el7.x86_64
rpcbind-0.2.0-38.el7.x86_64
kernel-3.10.0-957.1.3.el7.x86_64

cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)


The last time it booted, 6 months ago, it was using kernel 3.10.0-862.6.3.el7.x86_64.

Remember, I didn't change anything myself, and rpc, NFS, and YP were all working before the reboot.

Any ideas???

Thanks.

Re: polkit conniptions

Posted: 2018/12/13 01:28:15
by mathog

Code: Select all

systemctl enable rpcbind
Failed to execute operation: Connection timed out
This logged nothing in /var/log/messages.

Now this is strange, with the firewall disabled (probably irrelevant) most of the time this happens:

Code: Select all

time rpcinfo -p localhost
rpcinfo: can't contact portmapper: RPC: Timed out

real    1m0.092s
but once and a while...

Code: Select all

rpcinfo -p localhost
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100011    1   udp   4003  rquotad
    100011    2   udp   4003  rquotad
    100011    1   tcp   4003  rquotad
    100011    2   tcp   4003  rquotad

Re: polkit conniptions

Posted: 2018/12/13 02:11:33
by mathog
Rebooted using the kernel used 6 months ago, 3.10.0-862.6.3.el7.x86_64, and everything worked. Multiple reboots using 3.10.0-862.6.3.el7.x86_64 had all failed. Simplest model is that that the new kernel is broken and the old one works. Looks like tomorrow I get to reboot this machine through one kernel after another to find the latest kernel which does not have this issue.

This system will run for a very long time stably, but it has now failed to reboot properly after each of the last two long sessions. This really does not inspire a lot of confidence in the stability of Centos/RH systems when yum is allowed to update during the run. Reboot should really "just work", right?