Pacemaker: Difference between revisions

From DWIKI
 
(42 intermediate revisions by the same user not shown)
Line 1: Line 1:
uses [[Corosync]] or [[heartbeat]], (it seems) corosync is the one to go for.


=Links=
uses [[Corosync|Corosync]] or [[Heartbeat|heartbeat]], (it seems) corosync is the one to go for.
*[http://clusterlabs.org/ Cluster Labs]
*https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
*[http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/ Pacemaker explained]
*[http://fibrevillage.com/sysadmin/304-pcs-command-reference pcs command resference]
*[http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs Building a high-available failover cluster with Pacemaker, Corosync & PCS]
*[https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/High_Availability_Add-On_Administration/index.html HIGH AVAILABILITY ADD-ON ADMINISTRATION]
*[https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-corosync-pacemaker-and-floating-ips-on-ubuntu-14-04 How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04]
*http://fibrevillage.com/sysadmin/304-pcs-command-reference
*[http://fibrevillage.com/sysadmin/321-pacemaker-and-pcs-on-linux-example-managing-cluster-resource Pacemaker and pcs on Linux example, managing cluster resource]
*[http://djlab.com/2013/04/pacemaker-corosync-drbd-cheatsheet/ Cheatsheet]
*[https://www.freesoftwareservers.com/wiki/pcs-tips-n-tricks-constraints-delete-resources-3965539.html PCS tips&tricks]
*[http://www.nashville-linux-guy.com/index.php/blog/45-centos-7-active-active-iscsi-cluster pacemaker + drbd + iscsi, also useful pcs tips]
*[https://www.hastexo.com/resources/hints-and-kinks/mandatory-and-advisory-ordering-pacemaker/ Mandatory and advisory ordering in Pacemaker]
*http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_specifying_a_preferred_location.html
*[https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-ordering.html resource sets]


=Notes=
 
by specifying -INFINITY, the constraint is binding.




= Links =


=Commands/tools=
*[http://clusterlabs.org/ Cluster Labs]
*[https://github.com/ClusterLabs/resource-agents/ Pacemaker resource agents on github]
*[http://www.linux-ha.org/doc/man-pages/ap-ra-man-pages.html Linux-HA manpages]
*[https://clusterlabs.org/quickstart.html pacemaker quickstart]
*[https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md]
*[https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_pacemaker_architecture.html Pacemaker Architecture]
*[http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/ Pacemaker explained]
*[http://fibrevillage.com/sysadmin/304-pcs-command-reference pcs command resference]
*[http://fibrevillage.com/sysadmin/321-pacemaker-and-pcs-on-linux-example-managing-cluster-resource Pacemaker and pcs on Linux example, managing cluster resource]
*[http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs Building a high-available failover cluster with Pacemaker, Corosync & PCS]
*[https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/High_Availability_Add-On_Administration/index.html HIGH AVAILABILITY ADD-ON ADMINISTRATION]
*[https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-corosync-pacemaker-and-floating-ips-on-ubuntu-14-04 How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04]
*[http://fibrevillage.com/sysadmin/304-pcs-command-reference http://fibrevillage.com/sysadmin/304-pcs-command-reference]
*[http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services]
*[http://fibrevillage.com/sysadmin/321-pacemaker-and-pcs-on-linux-example-managing-cluster-resource Pacemaker and pcs on Linux example, managing cluster resource]
*[http://djlab.com/2013/04/pacemaker-corosync-drbd-cheatsheet/ Cheatsheet]
*[https://redhatlinux.guru/2018/05/22/pacemaker-cheat-sheet/ Pacemaker cheat sheet]
*[https://www.freesoftwareservers.com/wiki/pcs-tips-n-tricks-constraints-delete-resources-3965539.html PCS tips&tricks]
*[https://www.hastexo.com/resources/hints-and-kinks/mandatory-and-advisory-ordering-pacemaker/ Mandatory and advisory ordering in Pacemaker]
*[http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_specifying_a_preferred_location.html http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_specifying_a_preferred_location.html]
*[https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-ordering.html resource sets]
*[https://www.alteeve.com/w/History_of_HA_Clustering History of HA clustering]
*[http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html The OCF Resource Agent Developer’s Guide]
*[https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html]]
*[https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-maintenance.html#sec-ha-maint-shutdown-node Implications of Taking Down a Cluster Node]
*[https://www.programmerall.com/article/11571745093/ Corosync + Pacemaker + CRMSH Build a web high available cluster]


*crm
= Notes =
*crmadmin
 
*cibadm
by specifying -INFINITY, the constraint is binding.
*pcs
 
*[[corosync]]
 
 
= Quickstart =
 
Keep in mind you might want to use dedicated IPs for sync, so define those in /etc/hosts On both nodes
 
#set password
 
passwd hacluster
 
systemctl start pcsd.service
systemctl enable pcsd.service
 
 
 
= Commands/tools =
 
*crm  
*crmadmin  
*cibadm  
*pcs  
*[[Corosync|corosync]]  
 
= Useful commands =
 
== save entire config ==
 
pcs config backup configfile
 
== Dump entire crm ==


=Useful commands=
==Dump entire crm==
  cibadm -Q
  cibadm -Q


=FAQ=


==Move resource to node==
= HOWTO =
 
 
 
== Groups ==
 
=== Add existing resource to group ===
 
pcs resource group add GROUPID RESOURCEID
     
 
=== Stop resource group ===
pcs resource disable MYGROUP
 
=== See if entire group is disabled ===
 
pcs resource show MYGROUP
 
Meta Attrs: target-role=Stopped
 
= FAQ =
 
== Update resource ==
 
pcs resource update resourcname variablename=newvalue
 
 
 
== Current DC ==
 
In output of
 
pcs status
 
this is Designated Controller
 
 
 
== Remove resource group + members ==
pcs resource delete whateverresource
 
 
== Move resource to node ==
 
  pcs resource move RES NODE
  pcs resource move RES NODE


==Undo resource move==
== Show default resource stickiness ==
 
pcs resource default
 
== Set resource stickiness ==
 
pcs resource meta <resource_id> resource-stickiness=100
 
and to check:
 
pcs resource show <resource_id>
 
Or better yet:
 
crm_simulate -Ls
 
== Undo resource move ==
 
  pcs constraint --full
  pcs constraint --full
  pcs constraint remove found-res-id
<pre>Location Constraints:
  Resource: FOO
    Enabled on: santest-a (score:INFINITY) (role: Started) (id:cli-prefer-FOO)
</pre>
 
  pcs constraint remove cli-prefer-FOO
 
== pcs status: Error: cluster is not currently running on this node ==


==pcs status: Error: cluster is not currently running on this node==
Don't panic until after
Don't panic until after
  sudo pcs status
  sudo pcs status


==show detailed resources==
== show detailed resources ==
 
  pcs resource --full
  pcs resource --full


==stop node==
== stop node (standby) ==
 
The following command puts the specified node into standby mode. The specified node is no longer able to host resources. Any resources currently active on the node will be moved to another node. If you specify the --all, this command puts all nodes into standby mode.
 
&nbsp;
 
  pcs cluster standby node-1
  pcs cluster standby node-1
or
or
  pcs node standby
  pcs node standby
on the node itself
on the node itself


==set maintenance mode==
and undo this with
 
pcs cluster unstandby node-1
 
or
 
pcs node unstandby
 
== set maintenance mode ==
 
This sets the cluster in maintenance mode, so it stops managing the resources
 
  pcs property set maintenance-mode=true
  pcs property set maintenance-mode=true


== Error: cluster is not currently running on this node ==


==Error: cluster is not currently running on this node==
  pcs cluster start [<node name>]
  pcs cluster start


== Remove a constraint ==


pcs constraint list --full


==Remove a constraint==
pcs constraint list --full
to identify the constraints and then
to identify the constraints and then
  pcs constraint remove <whatever-constraint-id>
  pcs constraint remove <whatever-constraint-id>


==Clear error messages==
== Clear error messages ==
 
   pcs resource cleanup
   pcs resource cleanup


==Call cib_replace failed (-205): Update was older than existing configuration==
== Call cib_replace failed (-205): Update was older than existing configuration ==
 
  can be run only once
  can be run only once


==[Error signing on to the CIB service: Transport endpoint is not connected ]==
== [Error signing on to the CIB service: Transport endpoint is not connected ] ==
 
  probably selinux
  probably selinux


==Show allocation scores==
== Show allocation scores ==
 
  crm_simulate -sL
  crm_simulate -sL


==Show resource failcount==
== Show resource failcount ==
 
  pcs resource failcount show <resource>
  pcs resource failcount show <resource>


&nbsp;
== export current configuration as commands ==
pcs config export pcs-commands
== debug resource ==


==debug resource==
  pcs resource debug-start resource
  pcs resource debug-start resource


== *** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services ==
== *** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services ==
 
Cluster is in maintenance mode
 
== Found meta data is "unclean", please apply-al first ==
 
== Troubleshooting ==
 
*[http://blog.clusterlabs.org/blog/2013/debugging-pengine Debugging the policy engine]
 
== pcs status all resources stopped ==
 
probably a bad ordering constraint
 
== Fencing and resource management disabled due to lack of quorum ==
 
Problably means you forgot to pcs cluster start the other node
 
== Resource cannot run anywhere ==
 
Check if some stickiness was set
 
== pcs resource update unable to find resource ==
 
Trying to unset stickiness:
 
pcs resource update ISCSIgroupTEST1 meta resource-stickiness=
 
caused: Error: Unable to find resource: ISCSIgroupTEST1
 
what his means is: try it on the host where stickiness was set&nbsp;:)
 
== Difference between maintenance-mode and standby ==
 
Still not clear
 
== drbdadm create-md test3 'test3' not defined in your config (for this host). ==
 
You're supposed to use `hostname` in the 'on ...' bit
 
&nbsp;
 
== corosync: active/disabled ==
 
As far as i can tell means some resources have been disabled
 
== ocf-exit-reason:Undefined iSCSI target implementation ==
 
Install scsi-target-utils
 
== moving RES away after 1000000 failures ==


??
If failcount is 0, try pcs resource cleanup

Latest revision as of 12:37, 8 March 2022

uses Corosync or heartbeat, (it seems) corosync is the one to go for.

 


Links

Notes

by specifying -INFINITY, the constraint is binding.

 

Quickstart

Keep in mind you might want to use dedicated IPs for sync, so define those in /etc/hosts On both nodes

  1. set password
passwd hacluster
systemctl start pcsd.service
systemctl enable pcsd.service

 

Commands/tools

Useful commands

save entire config

pcs config backup configfile

Dump entire crm

cibadm -Q


HOWTO

Groups

Add existing resource to group

pcs resource group add GROUPID RESOURCEID
      

Stop resource group

pcs resource disable MYGROUP

See if entire group is disabled

pcs resource show MYGROUP

Meta Attrs: target-role=Stopped

FAQ

Update resource

pcs resource update resourcname variablename=newvalue

 

Current DC

In output of

pcs status

this is Designated Controller

 

Remove resource group + members

pcs resource delete whateverresource


Move resource to node

pcs resource move RES NODE

Show default resource stickiness

pcs resource default

Set resource stickiness

pcs resource meta <resource_id> resource-stickiness=100

and to check:

pcs resource show <resource_id>

Or better yet:

crm_simulate -Ls

Undo resource move

pcs constraint --full
Location Constraints:
  Resource: FOO
    Enabled on: santest-a (score:INFINITY) (role: Started) (id:cli-prefer-FOO)
pcs constraint remove cli-prefer-FOO

pcs status: Error: cluster is not currently running on this node

Don't panic until after

sudo pcs status

show detailed resources

pcs resource --full

stop node (standby)

The following command puts the specified node into standby mode. The specified node is no longer able to host resources. Any resources currently active on the node will be moved to another node. If you specify the --all, this command puts all nodes into standby mode.

 

pcs cluster standby node-1

or

pcs node standby

on the node itself

and undo this with

pcs cluster unstandby node-1

or

pcs node unstandby

set maintenance mode

This sets the cluster in maintenance mode, so it stops managing the resources

pcs property set maintenance-mode=true

Error: cluster is not currently running on this node

pcs cluster start [<node name>]

Remove a constraint

pcs constraint list --full

to identify the constraints and then

pcs constraint remove <whatever-constraint-id>

Clear error messages

 pcs resource cleanup

Call cib_replace failed (-205): Update was older than existing configuration

can be run only once

[Error signing on to the CIB service: Transport endpoint is not connected ]

probably selinux

Show allocation scores

crm_simulate -sL

Show resource failcount

pcs resource failcount show <resource>

 

export current configuration as commands

pcs config export pcs-commands

debug resource

pcs resource debug-start resource

*** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services

Cluster is in maintenance mode

Found meta data is "unclean", please apply-al first

Troubleshooting

pcs status all resources stopped

probably a bad ordering constraint

Fencing and resource management disabled due to lack of quorum

Problably means you forgot to pcs cluster start the other node

Resource cannot run anywhere

Check if some stickiness was set

pcs resource update unable to find resource

Trying to unset stickiness:

pcs resource update ISCSIgroupTEST1 meta resource-stickiness=

caused: Error: Unable to find resource: ISCSIgroupTEST1

what his means is: try it on the host where stickiness was set :)

Difference between maintenance-mode and standby

Still not clear

drbdadm create-md test3 'test3' not defined in your config (for this host).

You're supposed to use `hostname` in the 'on ...' bit

 

corosync: active/disabled

As far as i can tell means some resources have been disabled

ocf-exit-reason:Undefined iSCSI target implementation

Install scsi-target-utils

moving RES away after 1000000 failures

If failcount is 0, try pcs resource cleanup