Pacemaker: Difference between revisions
m (→FAQ) |
m (→FAQ) |
||
Line 182: | Line 182: | ||
You're supposed to use `hostname` in the 'on ...' bit | You're supposed to use `hostname` in the 'on ...' bit | ||
==corosync: active/disabled== | |||
As far as i can tell means some resources have been disabled | |||
==ocf-exit-reason:Undefined iSCSI target implementation== | ==ocf-exit-reason:Undefined iSCSI target implementation== | ||
Install scsi-target-utils | Install scsi-target-utils |
Revision as of 14:06, 7 August 2020
uses Corosync or heartbeat, (it seems) corosync is the one to go for.
Links
- Cluster Labs
- pacemaker quickstart
- https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
- Pacemaker Architecture
- Pacemaker explained
- pcs command resference
- Pacemaker and pcs on Linux example, managing cluster resource
- Building a high-available failover cluster with Pacemaker, Corosync & PCS
- HIGH AVAILABILITY ADD-ON ADMINISTRATION
- How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04
- http://fibrevillage.com/sysadmin/304-pcs-command-reference
- http://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
- Pacemaker and pcs on Linux example, managing cluster resource
- Cheatsheet
- Pacemaker cheat sheet
- PCS tips&tricks
- Mandatory and advisory ordering in Pacemaker
- http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_specifying_a_preferred_location.html
- resource sets
- History of HA clustering
- The OCF Resource Agent Developer’s Guide
- https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/_visualizing_the_action_sequence.html]
- Implications of Taking Down a Cluster Node
Notes
by specifying -INFINITY, the constraint is binding.
Quickstart
Keep in mind you might want to use dedicated IPs for sync, so define those in /etc/hosts On both nodes
- set password
passwd hacluster
systemctl start pcsd.service systemctl enable pcsd.service
Commands/tools
- crm
- crmadmin
- cibadm
- pcs
- corosync
Useful commands
save entire config
pcs config backup configfile
Dump entire crm
cibadm -Q
FAQ
Update resource
pcs resource update resourcname variablename=newvalue
Current DC
In output of
pcs status
this is Designated Controller
Move resource to node
pcs resource move RES NODE
Show default resource stickiness
pcs resource default
Set resource stickiness
pcs resource meta <resource_id> resource-stickiness=100
and to check:
pcs resource show <resource_id>
Or better yet:
crm_simulate -Ls
Undo resource move
pcs constraint --full
Location Constraints: Resource: FOO Enabled on: santest-a (score:INFINITY) (role: Started) (id:cli-prefer-FOO)
pcs constraint remove cli-prefer-FOO
pcs status: Error: cluster is not currently running on this node
Don't panic until after
sudo pcs status
show detailed resources
pcs resource --full
stop node (standby)
The following command puts the specified node into standby mode. The specified node is no longer able to host resources. Any resources currently active on the node will be moved to another node. If you specify the --all, this command puts all nodes into standby mode.
pcs cluster standby node-1
or
pcs node standby
on the node itself
and undo this with
pcs cluster unstandby node-1
or
pcs node unstandby
set maintenance mode
This sets the cluster in maintenance mode, so it stops managing the resources
pcs property set maintenance-mode=true
Error: cluster is not currently running on this node
pcs cluster start [<node name>]
Remove a constraint
pcs constraint list --full
to identify the constraints and then
pcs constraint remove <whatever-constraint-id>
Clear error messages
pcs resource cleanup
Call cib_replace failed (-205): Update was older than existing configuration
can be run only once
[Error signing on to the CIB service: Transport endpoint is not connected ]
probably selinux
Show allocation scores
crm_simulate -sL
Show resource failcount
pcs resource failcount show <resource>
export current configuration as commands
pcs config export pcs-commands
debug resource
pcs resource debug-start resource
*** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services
Cluster is in maintenance mode
Found meta data is "unclean", please apply-al first
Troubleshooting
pcs status all resources stopped
probably a bad ordering constraint
Fencing and resource management disabled due to lack of quorum
Problably means you forgot to pcs cluster start the other node
Resource cannot run anywhere
Check if some stickiness was set
pcs resource update unable to find resource
Trying to unset stickiness:
pcs resource update ISCSIgroupTEST1 meta resource-stickiness=
caused: Error: Unable to find resource: ISCSIgroupTEST1
what his means is: try it on the host where stickiness was set :)
Difference between maintenance-mode and standby
Still not clear
drbdadm create-md test3 'test3' not defined in your config (for this host).
You're supposed to use `hostname` in the 'on ...' bit
corosync: active/disabled
As far as i can tell means some resources have been disabled
ocf-exit-reason:Undefined iSCSI target implementation
Install scsi-target-utils