Pacemaker: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
uses [[Corosync|Corosync]] or [[Heartbeat|heartbeat]], (it seems) corosync is the one to go for. | |||
| |||
= Links = | = Links = | ||
*[http://clusterlabs.org/ Cluster Labs] | *[http://clusterlabs.org/ Cluster Labs] | ||
*[http://www.linux-ha.org/doc/man-pages/ap-ra-man-pages.html Linux-HA manpages] | *[http://www.linux-ha.org/doc/man-pages/ap-ra-man-pages.html Linux-HA manpages] | ||
*[https://clusterlabs.org/quickstart.html pacemaker quickstart] | *[https://clusterlabs.org/quickstart.html pacemaker quickstart] | ||
*[https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md] | *[https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md] | ||
Line 27: | Line 29: | ||
*[http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html The OCF Resource Agent Developer’s Guide] | *[http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html The OCF Resource Agent Developer’s Guide] | ||
*[https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html]] | *[https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html]] | ||
*[https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-maintenance.html#sec-ha-maint-shutdown-node Implications of Taking Down a Cluster Node] | *[https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-maintenance.html#sec-ha-maint-shutdown-node Implications of Taking Down a Cluster Node] | ||
=Notes= | = Notes = | ||
by specifying -INFINITY, the constraint is binding. | |||
| |||
= Quickstart = | |||
Keep in mind you might want to use dedicated IPs for sync, so define those in /etc/hosts On both nodes | |||
#set password | |||
passwd hacluster | passwd hacluster | ||
Line 42: | Line 48: | ||
systemctl enable pcsd.service | systemctl enable pcsd.service | ||
| |||
= Commands/tools = | |||
*crm | |||
*crmadmin | |||
*cibadm | |||
*pcs | |||
*[[Corosync|corosync]] | |||
= Useful commands = | |||
== save entire config == | |||
pcs config backup configfile | pcs config backup configfile | ||
==Dump entire crm== | == Dump entire crm == | ||
cibadm -Q | cibadm -Q | ||
= HOWTO = | |||
==Groups== | |||
===Add existing resource to group=== | |||
pcs resource group add GROUPID RESOURCEID | |||
Line 239: | Line 257: | ||
== moving RES away after 1000000 failures == | == moving RES away after 1000000 failures == | ||
If failcount is 0, try pcs resource cleanup | If failcount is 0, try pcs resource cleanup |
Revision as of 13:43, 2 July 2021
uses Corosync or heartbeat, (it seems) corosync is the one to go for.
Links
- Cluster Labs
- Linux-HA manpages
- pacemaker quickstart
- https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
- Pacemaker Architecture
- Pacemaker explained
- pcs command resference
- Pacemaker and pcs on Linux example, managing cluster resource
- Building a high-available failover cluster with Pacemaker, Corosync & PCS
- HIGH AVAILABILITY ADD-ON ADMINISTRATION
- How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04
- http://fibrevillage.com/sysadmin/304-pcs-command-reference
- http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
- Pacemaker and pcs on Linux example, managing cluster resource
- Cheatsheet
- Pacemaker cheat sheet
- PCS tips&tricks
- Mandatory and advisory ordering in Pacemaker
- http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_specifying_a_preferred_location.html
- resource sets
- History of HA clustering
- The OCF Resource Agent Developer’s Guide
- https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html]
- Implications of Taking Down a Cluster Node
Notes
by specifying -INFINITY, the constraint is binding.
Quickstart
Keep in mind you might want to use dedicated IPs for sync, so define those in /etc/hosts On both nodes
- set password
passwd hacluster
systemctl start pcsd.service systemctl enable pcsd.service
Commands/tools
- crm
- crmadmin
- cibadm
- pcs
- corosync
Useful commands
save entire config
pcs config backup configfile
Dump entire crm
cibadm -Q
HOWTO
Groups
Add existing resource to group
pcs resource group add GROUPID RESOURCEID
FAQ
Update resource
pcs resource update resourcname variablename=newvalue
Current DC
In output of
pcs status
this is Designated Controller
Move resource to node
pcs resource move RES NODE
Show default resource stickiness
pcs resource default
Set resource stickiness
pcs resource meta <resource_id> resource-stickiness=100
and to check:
pcs resource show <resource_id>
Or better yet:
crm_simulate -Ls
Undo resource move
pcs constraint --full
Location Constraints: Resource: FOO Enabled on: santest-a (score:INFINITY) (role: Started) (id:cli-prefer-FOO)
pcs constraint remove cli-prefer-FOO
pcs status: Error: cluster is not currently running on this node
Don't panic until after
sudo pcs status
show detailed resources
pcs resource --full
stop node (standby)
The following command puts the specified node into standby mode. The specified node is no longer able to host resources. Any resources currently active on the node will be moved to another node. If you specify the --all, this command puts all nodes into standby mode.
pcs cluster standby node-1
or
pcs node standby
on the node itself
and undo this with
pcs cluster unstandby node-1
or
pcs node unstandby
set maintenance mode
This sets the cluster in maintenance mode, so it stops managing the resources
pcs property set maintenance-mode=true
Error: cluster is not currently running on this node
pcs cluster start [<node name>]
Remove a constraint
pcs constraint list --full
to identify the constraints and then
pcs constraint remove <whatever-constraint-id>
Clear error messages
pcs resource cleanup
Call cib_replace failed (-205): Update was older than existing configuration
can be run only once
[Error signing on to the CIB service: Transport endpoint is not connected ]
probably selinux
Show allocation scores
crm_simulate -sL
Show resource failcount
pcs resource failcount show <resource>
export current configuration as commands
pcs config export pcs-commands
debug resource
pcs resource debug-start resource
*** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services
Cluster is in maintenance mode
Found meta data is "unclean", please apply-al first
Troubleshooting
pcs status all resources stopped
probably a bad ordering constraint
Fencing and resource management disabled due to lack of quorum
Problably means you forgot to pcs cluster start the other node
Resource cannot run anywhere
Check if some stickiness was set
pcs resource update unable to find resource
Trying to unset stickiness:
pcs resource update ISCSIgroupTEST1 meta resource-stickiness=
caused: Error: Unable to find resource: ISCSIgroupTEST1
what his means is: try it on the host where stickiness was set :)
Difference between maintenance-mode and standby
Still not clear
drbdadm create-md test3 'test3' not defined in your config (for this host).
You're supposed to use `hostname` in the 'on ...' bit
corosync: active/disabled
As far as i can tell means some resources have been disabled
ocf-exit-reason:Undefined iSCSI target implementation
Install scsi-target-utils
moving RES away after 1000000 failures
If failcount is 0, try pcs resource cleanup