pacemaker

General support questions

pacemaker

Postby Jonas » 2012/07/30 11:08:20

Hi everyone,

Now I want install pacemaker on my centos 6.2.
All works well but I have juste a small question..


When I run /etc/init.d/corosync start, my log files returned

Code: Select all
 
Jul 28 06:11:40 corosync [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:1864.
Jul 28 06:11:41 corosync [MAIN  ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Jul 28 06:11:41 corosync [MAIN  ] Corosync built-in features: nss dbus rdma snmp
Jul 28 06:11:41 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jul 28 06:11:41 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 28 06:11:41 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 28 06:11:41 corosync [TOTEM ] The network interface [XXX.XXX.XXX.XXX] is now up.
Jul 28 06:11:41 corosync [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Jul 28 06:11:41 corosync [SERV  ] Service engine loaded: corosync configuration service
Jul 28 06:11:41 corosync [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 28 06:11:41 corosync [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Jul 28 06:11:41 corosync [SERV  ] Service engine loaded: corosync profile loading service
Jul 28 06:11:41 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jul 28 06:11:41 corosync [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jul 28 06:11:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 28 06:11:41 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.0.191) ; members(old:0 left:0)
Jul 28 06:11:41 corosync [MAIN  ] Completed service synchronization, ready to provide service.


So I thinks it's good, but when I make crm_mon --one-shot for show status the way out is:
Code: Select all
Connection to cluster failed: connection failed


Who is this subtlety?

PS: I disable my IPtables.

Kind regards,

Jonas.
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

pacemaker

Postby TrevorH » 2012/07/30 15:56:16

If you have a stanza in corosync.conf to invoke pacemaker and it says "ver: 1" then pacemaker has to be set up to start on its via chkconfig. If it says ver:0 then corosync starts it however ver:1 is recommended.
User avatar
TrevorH
Forum Moderator
 
Posts: 9113
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: pacemaker

Postby Jonas » 2012/07/30 17:45:51

In fact, when I was generate my authkey file, I use my ftp server for copy it on my other both. I fact we could use csp /etc/corosync.... ^^

But now, I want configure my virtual IP, but I have some problem to create it.
I use :
Code: Select all
crm configure property stonith-enabled="false"
crm configure property no-quorum-policy=ignore
crm configure primitive failover-ip ocf:heartbeat:IPaddr params ip="XXX.XXX.XXX.XXX" op monitor interval="2s"


I have already the same error message, and I don't know how I can fix it.


Code: Select all
crm(live)configure# cd
There are changes pending. Do you want to commit them? y
Call cib_replace failed (-41): Remote node did not respond
<null>
ERROR: could not replace cib
INFO: offending xml: <configuration>
   <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
         <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
         <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
      </cluster_property_set>
   </crm_config>
   <nodes/>
   <resources>
      <primitive class="ocf" id="failover-ip" provider="heartbeat" type="IPaddr">
         <instance_attributes id="failover-ip-instance_attributes">
            <nvpair id="failover-ip-instance_attributes-ip" name="ip" value="192.168.0.190"/>
         </instance_attributes>
         <operations>
            <op id="failover-ip-monitor-2s" interval="2s" name="monitor"/>
         </operations>
      </primitive>
   </resources>
   <constraints/>
</configuration>



In /var/log/messages I have :

Code: Select all
Jul 28 12:39:19 clusterACE1 crmd[2203]:     info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:19 clusterACE1 crmd[2203]:  warning: do_lrm_control: Failed to sign on to the LRM 21 (30 max) times
Jul 28 12:39:21 clusterACE1 crmd[2203]:     info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:21 clusterACE1 crmd[2203]:  warning: do_lrm_control: Failed to sign on to the LRM 22 (30 max) times
Jul 28 12:39:23 clusterACE1 crmd[2203]:     info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:23 clusterACE1 crmd[2203]:  warning: do_lrm_control: Failed to sign on to the LRM 23 (30 max) times
Jul 28 12:39:23 clusterACE1 corosync[1963]:   [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Jul 28 12:39:25 clusterACE1 crmd[2203]:     info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:39:25 clusterACE1 crmd[2203]:  warning: do_lrm_control: Failed to sign on to the LRM 24 (30 max) times


And in corosync.log, I have :
Code: Select all
Jul 28 12:41:36 [2210] clusterACE1       crmd:    debug: s_crmd_fsa:    Exiting the FSA: queue=0, fsa_actions=0x100001200000002, stalled=true
Jul 28 12:41:38 [2210] clusterACE1       crmd:     info: crm_timer_popped:    Wait Timer (I_NULL) just popped (2000ms)
Jul 28 12:41:38 [2210] clusterACE1       crmd:    debug: do_fsa_action:    actions:trace:    // A_LRM_CONNECT
Jul 28 12:41:38 [2210] clusterACE1       crmd:    debug: do_lrm_control:    Connecting to the LRM
Jul 28 12:41:38 [2210] clusterACE1       crmd:  warning: do_lrm_control:    Failed to sign on to the LRM 9 (30 max) times
Jul 28 12:41:38 [2210] clusterACE1       crmd:    debug: crm_timer_start:    Started Wait Timer (I_NULL:2000ms), src=15
Jul 28 12:41:38 [2210] clusterACE1       crmd:    debug: register_fsa_input_adv:    Stalling the FSA pending further input: cause=C_FSA_INTERNAL
Jul 28 12:41:38 [2210] clusterACE1       crmd:    debug: s_crmd_fsa:    Exiting the FSA: queue=0, fsa_actions=0x100001200000002, stalled=true




My configure file is :

Code: Select all
# Please read the corosync.conf.5 manual page
compatibility: whitetank

totem {
   version: 2
   
   # DECLARING A TOKEN LOST (MS)
   token: 3000
   token_retransmits_before_loss_const: 10
   join: 60
   consensus: 3600
   vsftype: nome
   max_messages: 20
   clear_node_high_bit: yes
   secauth: off
   threads: 0
   rrp_mode: none

interface {
      ringnumber: 0
      bindnetaddr: XXX.XXX.XXX.XXX
      mcastaddr: 226.94.1.1
      mcastport: 5405
      ttl: 1
   }
}

amf {
           mode: disabled

}

service {
      ver:   0
      name: pacemaker
}


aisexec {
      user: root
      group: root
}


logging {
   fileline: off
   to_stderr: no
   to_logfile: yes
   to_syslog: yes
   syslog_facility: daemon
   logfile: /var/log/cluster/corosync.log
   debug: on
   timestamp: on
   logger_subsys {
      subsys: AMF
      debug: on
      tags: enter|leave|trace1|trace2|trace3|trace4|trace6
   }
}



I have fail something....? Probably :lol:

PS:
Code: Select all
 rpm -qa pacemaker
pacemaker-1.1.7-6.el6.i686


I need something else?
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Postby TrevorH » 2012/07/30 19:57:51

Did you open the firewall properly? When it specifies port 5405 it actually uses that and one less I think - so you need to open both 5405 and 5404. You'll also need to enable mcast if you are connected via a switch.

Oh, and your bindnetaddr needs to be the subnet not the actual IP address that it should bind to. Since you obscured that, I can't tell if it's correct or not.
User avatar
TrevorH
Forum Moderator
 
Posts: 9113
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: pacemaker

Postby Jonas » 2012/07/30 20:26:31

Yeah, but I'm disable my iptables héhé.

Could you post here an example of configuration file for understand well..

If I want use 172.16.5.1 and 172.16.5.2 for my nodes and 172.16.5.3 for my virtual IP.

Kind regards.
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Postby TrevorH » 2012/07/30 22:42:01

I'd look at the switch that connects your systems and see if it has multicast enabled before anything else. From the limited information you've given I would guess that your bindnetaddress should be 172.16.5.0
User avatar
TrevorH
Forum Moderator
 
Posts: 9113
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: pacemaker

Postby Jonas » 2012/07/31 09:32:36

In fact, It's virtual machines and they In local.

I changed my main configuration file with

Code: Select all
totem {
   version: 2
   secauth: off
   interface {
      member {
         memberaddr: 172.16.XXX.XXX
      }
      member {
         memberaddr: 172.16.XXX.XXX
      }
    ringnumber: 0
      bindnetaddr: 172.16.XXX.XXX
 mcastport: 5405
 }
   transport: udpu
}

logging {
   fileline: off
   to_logfile: yes
   to_syslog: yes
   debug: on
   logfile: /var/log/cluster/corosync.log
   debug: off
   timestamp: on
   logger_subsys {
      subsys: AMF
      debug: off
   }

And It's doesn't work beter.

I have a question, why 5405 port listen on 127.0.0.1, while this port could listen on 172.16.XXX.XXX?
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Postby Jonas » 2012/08/01 15:01:00

Good, I have "2 Nodes".

Well , If I want httpd available all the time, how can I create it?

I do :

Code: Select all
crm(live)configure#primitive httpd ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/sites-enabled/*.conf  \
port="80" \
op start interval="0s" timeout="60s" \
op monitor interval="5s" timeout="20s" \
op stop interval="0s" timeout="60s"

But It"s strange, I didn't need to specified my cluster IP? And I need to copy it on the second nonde?
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Postby Jonas » 2012/08/05 18:34:03

Did you know if an package is available how can notified by Email if my node or my service is down?

I'm search for configure my crm, but...^^

King regards
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Re: pacemaker

Postby Jonas » 2012/08/06 09:56:07

Did we configure an other resources for drbd on pacemarker.

I configure one resource (drbd0) on it, all is well, but when I want add my drbd1 resource on pacemaker I have already this error.

crm_mon :
Code: Select all
 
Current DC: cluster1   - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
8 Resources configured.

Online: [ cluster2 cluster1 ]

 Resource Group: ftpserver
     vip        (ocf::heartbeat:IPaddr2):   Started clusterACE1
     vsftpd     (lsb:vsftpd):   Started clusterACE1
 Master/Slave Set: ms_drbd0 [drbd0]
     Masters: [ cluster1 ]
     Slaves: [ cluster2]
 Master/Slave Set: ms_drbd1 [drbd1]
     drbd1:0    (ocf::linbit:drbd):     Slave cluster2 (unmanaged) FAILED
     drbd1:1    (ocf::linbit:drbd):     Slave cluster1 (unmanaged) FAILED

Failed actions:
    drbd1:1_stop_0 (node=cluster1, call=49, rc=5, status=complete): not installed
    drbd1:0_stop_0 (node=cluster2, call=12, rc=5, status=complete): not installed


My configure Is the same like my device drbd0.

King Regards.
Jonas
 
Posts: 67
Joined: 2012/02/01 12:45:23
Location: France, Rouen

Next

Return to CentOS 6 - General Support

Who is online

Users browsing this forum: No registered users and 4 guests