pacemaker

General support questions
Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

pacemaker

Post by Jonas » 2012/07/30 11:08:20

Hi everyone,

Now I want install pacemaker on my centos 6.2.
All works well but I have juste a small question..


When I run /etc/init.d/corosync start, my log files returned

[code]
Jul 28 06:11:40 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1864.
Jul 28 06:11:41 corosync [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Jul 28 06:11:41 corosync [MAIN ] Corosync built-in features: nss dbus rdma snmp
Jul 28 06:11:41 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jul 28 06:11:41 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 28 06:11:41 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 28 06:11:41 corosync [TOTEM ] The network interface [XXX.XXX.XXX.XXX] is now up.
Jul 28 06:11:41 corosync [SERV ] Service engine loaded: corosync extended virtual synchrony service
Jul 28 06:11:41 corosync [SERV ] Service engine loaded: corosync configuration service
Jul 28 06:11:41 corosync [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 28 06:11:41 corosync [SERV ] Service engine loaded: corosync cluster config database access v1.01
Jul 28 06:11:41 corosync [SERV ] Service engine loaded: corosync profile loading service
Jul 28 06:11:41 corosync [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 28 06:11:41 corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Jul 28 06:11:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 28 06:11:41 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.0.191) ; members(old:0 left:0)
Jul 28 06:11:41 corosync [MAIN ] Completed service synchronization, ready to provide service.
[/code]

So I thinks it's good, but when I make crm_mon --one-shot for show status the way out is:
[code]Connection to cluster failed: connection failed [/code]

Who is this subtlety?

PS: I disable my IPtables.

Kind regards,

Jonas.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

pacemaker

Post by TrevorH » 2012/07/30 15:56:16

If you have a stanza in corosync.conf to invoke pacemaker and it says "ver: 1" then pacemaker has to be set up to start on its via chkconfig. If it says ver:0 then corosync starts it however ver:1 is recommended.

Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Post by Jonas » 2012/07/30 17:45:51

In fact, when I was generate my authkey file, I use my ftp server for copy it on my other both. I fact we could use csp /etc/corosync.... ^^

But now, I want configure my virtual IP, but I have some problem to create it.
I use : [code]
crm configure property stonith-enabled="false"
crm configure property no-quorum-policy=ignore
crm configure primitive failover-ip ocf:heartbeat:IPaddr params ip="XXX.XXX.XXX.XXX" op monitor interval="2s"
[/code]

I have already the same error message, and I don't know how I can fix it.


[code]crm(live)configure# cd
There are changes pending. Do you want to commit them? y
Call cib_replace failed (-41): Remote node did not respond
<null>
ERROR: could not replace cib
INFO: offending xml: <configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
</cluster_property_set>
</crm_config>
<nodes/>
<resources>
<primitive class="ocf" id="failover-ip" provider="heartbeat" type="IPaddr">
<instance_attributes id="failover-ip-instance_attributes">
<nvpair id="failover-ip-instance_attributes-ip" name="ip" value="192.168.0.190"/>
</instance_attributes>
<operations>
<op id="failover-ip-monitor-2s" interval="2s" name="monitor"/>
</operations>
</primitive>
</resources>
<constraints/>
</configuration>[/code]


In /var/log/messages I have :

[code]Jul 28 12:39:19 clusterACE1 crmd[2203]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:19 clusterACE1 crmd[2203]: warning: do_lrm_control: Failed to sign on to the LRM 21 (30 max) times
Jul 28 12:39:21 clusterACE1 crmd[2203]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:21 clusterACE1 crmd[2203]: warning: do_lrm_control: Failed to sign on to the LRM 22 (30 max) times
Jul 28 12:39:23 clusterACE1 crmd[2203]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:23 clusterACE1 crmd[2203]: warning: do_lrm_control: Failed to sign on to the LRM 23 (30 max) times
Jul 28 12:39:23 clusterACE1 corosync[1963]: [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Jul 28 12:39:25 clusterACE1 crmd[2203]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:25 clusterACE1 crmd[2203]: warning: do_lrm_control: Failed to sign on to the LRM 24 (30 max) times [/code]

And in corosync.log, I have :
[code]
Jul 28 12:41:36 [2210] clusterACE1 crmd: debug: s_crmd_fsa: Exiting the FSA: queue=0, fsa_actions=0x100001200000002, stalled=true
Jul 28 12:41:38 [2210] clusterACE1 crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:41:38 [2210] clusterACE1 crmd: debug: do_fsa_action: actions:trace: // A_LRM_CONNECT
Jul 28 12:41:38 [2210] clusterACE1 crmd: debug: do_lrm_control: Connecting to the LRM
Jul 28 12:41:38 [2210] clusterACE1 crmd: warning: do_lrm_control: Failed to sign on to the LRM 9 (30 max) times
Jul 28 12:41:38 [2210] clusterACE1 crmd: debug: crm_timer_start: Started Wait Timer (I_NULL:2000ms), src=15
Jul 28 12:41:38 [2210] clusterACE1 crmd: debug: register_fsa_input_adv: Stalling the FSA pending further input: cause=C_FSA_INTERNAL
Jul 28 12:41:38 [2210] clusterACE1 crmd: debug: s_crmd_fsa: Exiting the FSA: queue=0, fsa_actions=0x100001200000002, stalled=true
[/code]



My configure file is :

[code]
# Please read the corosync.conf.5 manual page
compatibility: whitetank

totem {
version: 2

# DECLARING A TOKEN LOST (MS)
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
vsftype: nome
max_messages: 20
clear_node_high_bit: yes
secauth: off
threads: 0
rrp_mode: none

interface {
ringnumber: 0
bindnetaddr: XXX.XXX.XXX.XXX
mcastaddr: 226.94.1.1
mcastport: 5405
ttl: 1
}
}

amf {
mode: disabled

}

service {
ver: 0
name: pacemaker
}


aisexec {
user: root
group: root
}


logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
syslog_facility: daemon
logfile: /var/log/cluster/corosync.log
debug: on
timestamp: on
logger_subsys {
subsys: AMF
debug: on
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}

[/code]

I have fail something....? Probably :lol:

PS: [code] rpm -qa pacemaker
pacemaker-1.1.7-6.el6.i686
[/code]

I need something else?

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: pacemaker

Post by TrevorH » 2012/07/30 19:57:51

Did you open the firewall properly? When it specifies port 5405 it actually uses that and one less I think - so you need to open both 5405 and 5404. You'll also need to enable mcast if you are connected via a switch.

Oh, and your bindnetaddr needs to be the subnet not the actual IP address that it should bind to. Since you obscured that, I can't tell if it's correct or not.

Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Post by Jonas » 2012/07/30 20:26:31

Yeah, but I'm disable my iptables héhé.

Could you post here an example of configuration file for understand well..

If I want use 172.16.5.1 and 172.16.5.2 for my nodes and 172.16.5.3 for my virtual IP.

Kind regards.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: pacemaker

Post by TrevorH » 2012/07/30 22:42:01

I'd look at the switch that connects your systems and see if it has multicast enabled before anything else. From the limited information you've given I would guess that your bindnetaddress should be 172.16.5.0

Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Post by Jonas » 2012/07/31 09:32:36

In fact, It's virtual machines and they In local.

I changed my main configuration file with

[code]totem {
version: 2
secauth: off
interface {
member {
memberaddr: 172.16.XXX.XXX
}
member {
memberaddr: 172.16.XXX.XXX
}
ringnumber: 0
bindnetaddr: 172.16.XXX.XXX
mcastport: 5405
}
transport: udpu
}

logging {
fileline: off
to_logfile: yes
to_syslog: yes
debug: on
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}[/code]
And It's doesn't work beter.

I have a question, why 5405 port listen on 127.0.0.1, while this port could listen on 172.16.XXX.XXX?

Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Post by Jonas » 2012/08/01 15:01:00

Good, I have "2 Nodes".

Well , If I want httpd available all the time, how can I create it?

I do :

[code]
crm(live)configure#primitive httpd ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/sites-enabled/*.conf \
port="80" \
op start interval="0s" timeout="60s" \
op monitor interval="5s" timeout="20s" \
op stop interval="0s" timeout="60s"
[/code]
But It"s strange, I didn't need to specified my cluster IP? And I need to copy it on the second nonde?

Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Post by Jonas » 2012/08/05 18:34:03

Did you know if an package is available how can notified by Email if my node or my service is down?

I'm search for configure my crm, but...^^

King regards

Jonas
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Post by Jonas » 2012/08/06 09:56:07

Did we configure an other resources for drbd on pacemarker.

I configure one resource (drbd0) on it, all is well, but when I want add my drbd1 resource on pacemaker I have already this error.

crm_mon :
[code]
Current DC: cluster1 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
8 Resources configured.

Online: [ cluster2 cluster1 ]

Resource Group: ftpserver
vip (ocf::heartbeat:IPaddr2): Started clusterACE1
vsftpd (lsb:vsftpd): Started clusterACE1
Master/Slave Set: ms_drbd0 [drbd0]
Masters: [ cluster1 ]
Slaves: [ cluster2]
Master/Slave Set: ms_drbd1 [drbd1]
drbd1:0 (ocf::linbit:drbd): Slave cluster2 (unmanaged) FAILED
drbd1:1 (ocf::linbit:drbd): Slave cluster1 (unmanaged) FAILED

Failed actions:
drbd1:1_stop_0 (node=cluster1, call=49, rc=5, status=complete): not installed
drbd1:0_stop_0 (node=cluster2, call=12, rc=5, status=complete): not installed [/code]

My configure Is the same like my device drbd0.

King Regards.

Post Reply