Pacemaker
uses Corosync or heartbeat, (it seems) corosync is the one to go for.
Links
- Cluster Labs
- https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
- Pacemaker Architecture
- Pacemaker explained
- pcs command resference
- Pacemaker and pcs on Linux example, managing cluster resource
- Building a high-available failover cluster with Pacemaker, Corosync & PCS
- HIGH AVAILABILITY ADD-ON ADMINISTRATION
- How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04
- http://fibrevillage.com/sysadmin/304-pcs-command-reference
- http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
- Pacemaker and pcs on Linux example, managing cluster resource
- Cheatsheet
- Pacemaker cheat sheet
- PCS tips&tricks
- Mandatory and advisory ordering in Pacemaker
- http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_specifying_a_preferred_location.html
- resource sets
- History of HA clustering
- The OCF Resource Agent Developer’s Guide
- https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html]
Notes
by specifying -INFINITY, the constraint is binding.
Commands/tools
- crm
- crmadmin
- cibadm
- pcs
- corosync
Useful commands
save entire config
pcs config backup configfile
Dump entire crm
cibadm -Q
FAQ
Update resource
pcs resource update resourcname variablename=newvalue
Move resource to node
pcs resource move RES NODE
Show default resource stickiness
pcs resource default
Set resource stickiness
pcs resource meta <resource_id> resource-stickiness=100
and to check:
pcs resource show <resource_id>
Or better yet:
crm_simulate -Ls
Undo resource move
pcs constraint --full
Location Constraints: Resource: FOO Enabled on: santest-a (score:INFINITY) (role: Started) (id:cli-prefer-FOO)
pcs constraint remove cli-prefer-FOO
pcs status: Error: cluster is not currently running on this node
Don't panic until after
sudo pcs status
show detailed resources
pcs resource --full
stop node (standby)
The following command puts the specified node into standby mode. The specified node is no longer able to host resources. Any resources currently active on the node will be moved to another node. If you specify the --all, this command puts all nodes into standby mode.
pcs cluster standby node-1
or
pcs node standby
on the node itself
and undo this with
pcs cluster unstandby node-1
or
pcs node unstandby
set maintenance mode
This sets the cluster in maintenance mode, so it stops managing the resources
pcs property set maintenance-mode=true
Error: cluster is not currently running on this node
pcs cluster start
Remove a constraint
pcs constraint list --full
to identify the constraints and then
pcs constraint remove <whatever-constraint-id>
Clear error messages
pcs resource cleanup
Call cib_replace failed (-205): Update was older than existing configuration
can be run only once
[Error signing on to the CIB service: Transport endpoint is not connected ]
probably selinux
Show allocation scores
crm_simulate -sL
Show resource failcount
pcs resource failcount show <resource>
export current configuration as commands
pcs config export pcs-commands
debug resource
pcs resource debug-start resource
*** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services
Cluster is in maintenance mode
Found meta data is "unclean", please apply-al first
Troubleshooting
pcs status all resources stopped
probably a bad ordering constraint
Fencing and resource management disabled due to lack of quorum
Problably means you forgot to pcs cluster start the other node
Resource cannot run anywhere
Check if some stickiness was set
pcs resource update unable to find resource
Trying to unset stickiness:
pcs resource update ISCSIgroupTEST1 meta resource-stickiness=
caused: Error: Unable to find resource: ISCSIgroupTEST1
what his means is: try it on the host where stickiness was set :)