DRBD: Difference between revisions
mNo edit summary |
|||
(19 intermediate revisions by the same user not shown) | |||
Line 17: | Line 17: | ||
*[https://www.linbit.com/downloads/tech-guides/DRBD8_HA_iSCSI_Storage_with_drbd_and_Pacemaker_on_RHEL7.pdf HA iSCSI with drbd and pacemaker] | *[https://www.linbit.com/downloads/tech-guides/DRBD8_HA_iSCSI_Storage_with_drbd_and_Pacemaker_on_RHEL7.pdf HA iSCSI with drbd and pacemaker] | ||
*[https://www.linbit.com/drbd-user-guide/users-guide-drbd-8-4/#s-internal-meta-data Internal meta data] | *[https://www.linbit.com/drbd-user-guide/users-guide-drbd-8-4/#s-internal-meta-data Internal meta data] | ||
*[https://www.howtoforge.com/high-availability-nfs-with-drbd-plus-heartbeat High Availability NFS With DRBD + Heartbeat ] | |||
*[https://help.ubuntu.com/community/HighlyAvailableNFS Higly Available NFS on Ubuntu] | |||
*[https://docs.linbit.com/doc/users-guide-84/s-configure-sync-rate/ Configure sync rate] | *[https://docs.linbit.com/doc/users-guide-84/s-configure-sync-rate/ Configure sync rate] | ||
Line 64: | Line 66: | ||
==HOWTO create a drbd resource== | |||
= Cheatsheet = | |||
== HOWTO create a drbd resource == | |||
lvcreate -L2G -n test3 DRBD | lvcreate -L2G -n test3 DRBD | ||
Create resource file and verify it | Create resource file and verify it | ||
drbdadm dump -c test3.res | drbdadm dump -c test3.res | ||
Copy the resource file to /etc/drbd.d/ on both nodes On both nodes run | |||
drbdadm create-md test3 | drbdadm create-md test3 | ||
cat /proc/drbd | cat /proc/drbd | ||
Then: | |||
drbdadm up test3 | |||
should give: | should give: | ||
<pre> | <pre>3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- | ||
3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- | |||
</pre> | </pre> | ||
On one node run: | On one node run: | ||
drbdadm -- --overwrite-data-of-peer primary test3 | drbdadm -- --overwrite-data-of-peer primary test3 | ||
or just | |||
drbdadm primary --force test3 | |||
and check: | |||
cat /proc/drbd | cat /proc/drbd | ||
should give: | |||
<pre> | which should give: | ||
3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- | <pre>3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- | ||
ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500 | ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500 | ||
[>...................] sync'ed: 7.5% (1945500/2097052)K | [>...................] sync'ed: 7.5% (1945500/2097052)K | ||
</pre> | </pre> | ||
==Make device/node primary== | == Make device/node primary == | ||
Should already be done by previous step | Should already be done by previous step | ||
drbdadm primary yourdeviceID | drbdadm primary yourdeviceID | ||
or | or | ||
drbdsetup /dev/drbdX primary -o | drbdsetup /dev/drbdX primary -o | ||
==Create pcs resource== | == Create pcs resource == | ||
pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \ | pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \ | ||
notify interval=0s timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ | notify interval=0s timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ | ||
start interval=0s timeout=240 stop interval=0s timeout=100 --disabled | start interval=0s timeout=240 stop interval=0s timeout=100 --disabled | ||
This will result in the error: | This will result in the error: | ||
* TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.', | * TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.', | ||
last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms | last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms | ||
Just run | Just run | ||
pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 --disabled | pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 --disabled | ||
and then | and then | ||
pcs resource cleanup TEST3_DRBD | pcs resource cleanup TEST3_DRBD | ||
| |||
== Grow resource == | |||
On both nodes: | On both nodes: | ||
lvextend -L+10G /dev/DRBD/myresource | lvextend -L+10G /dev/DRBD/myresource | ||
On one node: | On one node: | ||
drbdadm resize myresource | drbdadm resize myresource | ||
| |||
== Check resource file == | |||
Editing files in /etc/drbd.d/ is a bad plan, to check syntax first: | Editing files in /etc/drbd.d/ is a bad plan, to check syntax first: | ||
drbdadm dump -c /tmp/test.res | drbdadm dump -c /tmp/test.res | ||
| |||
== Mapping resource name and device == | |||
ls -al /dev/drbd/<LVM volume group name>/by-disk/ | ls -al /dev/drbd/<LVM volume group name>/by-disk/ | ||
=FAQ= | ls -al /dev/drbd/by-res/ | ||
== Renaming DRBD resource == | |||
Is a simple matter of renaming the oldresource.res file and updating its contents. Rember to first (on one node is enough): | |||
drbdadm down myresource | |||
and if you're like me you might need to rename your logical volume too | |||
lvrename VG oldresource newresource | |||
== Remove drbd resource == | |||
drbdadm down myresource | |||
remove resource files | |||
== Show statistics == | |||
drbdsetup status --verbose --statistics | |||
= FAQ = | |||
== Get out of 'Standalone' == | |||
disconnect/connect until works :) | |||
== 1: State change failed: (-2) Need access to UpToDate data == | |||
when you get that trying to make a node/resource primary, try | |||
drbdadm primary drbdX --force | drbdadm primary drbdX --force | ||
== | == Failure: (102) Local address(port) already in use. == | ||
When drbdadm shows | |||
disk:Inconsistent | |||
and or | |||
replication:SyncTarget | |||
it's already busy, check | |||
cat /proc/drbd instead | |||
| |||
== calculate metadata size == | |||
[https://serverfault.com/questions/433999/calculating-drbd-meta-size https://serverfault.com/questions/433999/calculating-drbd-meta-size] | |||
| |||
Cs=`blockdev --getsz /dev/foo` | Cs=`blockdev --getsz /dev/foo` | ||
Line 146: | Line 227: | ||
TODO finish this | TODO finish this | ||
=='mydrbd' not defined in your config (for this host).== | == 'mydrbd' not defined in your config (for this host). == | ||
If drbdadm create-md throws this, 'this host' is the clue: it must match `hostname` | If drbdadm create-md throws this, 'this host' is the clue: it must match `hostname` | ||
[https://newbiedba.wordpress.com/2015/09/21/drbd-not-defined-in-your-config-for-this-host/ https://newbiedba.wordpress.com/2015/09/21/drbd-not-defined-in-your-config-for-this-host/] | |||
== show resource sizes == | |||
lsblk | lsblk | ||
==commands to show info== | == commands to show info == | ||
drbdmon | drbdmon | ||
drbdtop | drbdtop | ||
| |||
==resolving split brain issues== | == resolving split brain issues == | ||
*https://docs.linbit.com/doc/users-guide-83/s-resolve-split-brain/ | |||
*https://www.sebastien-han.fr/blog/2012/04/25/DRBD-split-brain/ | *[https://docs.linbit.com/doc/users-guide-83/s-resolve-split-brain/ https://docs.linbit.com/doc/users-guide-83/s-resolve-split-brain/] | ||
*[https://www.sebastien-han.fr/blog/2012/04/25/DRBD-split-brain/ https://www.sebastien-han.fr/blog/2012/04/25/DRBD-split-brain/] | |||
== diskless == | |||
You might try | You might try | ||
drbdadm attach drbd0 | drbdadm attach drbd0 | ||
| |||
== The disk contains an unclean file system (0, 0). == | |||
Metadata kept in Windows cache, refused to mount. Falling back to read-only mount because the NTFS partition is in an unsafe state. Please resume and shutdown Windows fully (no hibernation or fast restarting.) | |||
Metadata kept in Windows cache, refused to mount. | |||
Falling back to read-only mount because the NTFS partition is in an | |||
unsafe state. Please resume and shutdown Windows fully (no hibernation | |||
or fast restarting.) | |||
When trying to mount a snapshot (kpartx -av backup-snap1 etc) | When trying to mount a snapshot (kpartx -av backup-snap1 etc) | ||
Line 180: | Line 265: | ||
??? | ??? | ||
| |||
| |||
==sync is slow== | == sync is slow == | ||
*[https://forum.proxmox.com/threads/slow-drbd9-sync-like-20mbit-on-1gbit.27927/ https://forum.proxmox.com/threads/slow-drbd9-sync-like-20mbit-on-1gbit.27927/] | |||
On secondary: | On secondary: | ||
drbdadm disk-options --c-plan-ahead=0 --resync-rate=50M | |||
drbdadm disk-options --c-plan-ahead=0 --resync-rate=50M drbd0 | |||
and to reset after sync: | and to reset after sync: | ||
drbdadm adjust drbd0 | drbdadm adjust drbd0 | ||
==show configuration== | == show configuration == | ||
drbdsetup show | drbdsetup show | ||
| |||
== show more info == | |||
drbdsetup show-gi <minor-number> | drbdsetup show-gi <minor-number> | ||
Minor number is shown in drbdsetup show | Minor number is shown in drbdsetup show | ||
==update network settings== | == update network settings == | ||
drbdsetup net-options 10.0.0.1 10.0.0.2 --sndbuf-size=2M | drbdsetup net-options 10.0.0.1 10.0.0.2 --sndbuf-size=2M | ||
| |||
== mount: unknown filesystem type 'drbd' == | |||
Usually means your node is not primary. If you're sure you know what you're doing you can use | |||
mount -t ext4 /dev/drbd1 /drbdmount | mount -t ext4 /dev/drbd1 /drbdmount | ||
or when you want to mount the partion while drbd is down: | or when you want to mount the partion while drbd is down: | ||
kpartx -av /dev/mapper/DRBD-test1 | kpartx -av /dev/mapper/DRBD-test1 | ||
#add map DRBD-test1p1 (253:5): 0 10482928 linear /dev/mapper/DRBD-test1 2048 | #add map DRBD-test1p1 (253:5): 0 10482928 linear /dev/mapper/DRBD-test1 2048 | ||
Line 214: | Line 312: | ||
mount -o ro /dev/mapper/DRBD-test1o1 /mnt/test1 | mount -o ro /dev/mapper/DRBD-test1o1 /mnt/test1 | ||
==reload configuration== | == reload configuration == | ||
drbdadm --dry-run adjust <resourcename|all> | drbdadm --dry-run adjust <resourcename|all> | ||
and then | and then | ||
drbdadm adjust <resourcename|all> | drbdadm adjust <resourcename|all> | ||
==show configuration of resource== | == show configuration of resource == | ||
drbdsetup /dev/drbd0 show | drbdsetup /dev/drbd0 show | ||
| |||
== resource unknown == | |||
First try | First try | ||
drbdadm up resourcename | drbdadm up resourcename | ||
==Command 'drbdmeta 1 v08 /dev/drbd0 internal apply-al' terminated with exit code 20== | == Command 'drbdmeta 1 v08 /dev/drbd0 internal apply-al' terminated with exit code 20 == | ||
Most likely split brain issue, check dmesg etc | Most likely split brain issue, check dmesg etc | ||
==wfconnection== | == wfconnection standalone== | ||
Could be split brain situation | Could be split brain situation | ||
drbdadm -- --discard-my-data connect resource | drbdadm -- --discard-my-data connect resource | ||
==cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown== | ==Both nodes in WFConnection== | ||
Most like network connection problem | |||
== cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown == | |||
So you're on the primary node, secondary might showing nothing, then first | So you're on the primary node, secondary might showing nothing, then first | ||
drbdadm up drbdX | drbdadm up drbdX | ||
or if secondary shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", it is waiting for connection(no?) | or if secondary shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", it is waiting for connection(no?) | ||
drbdadm disconnect drbdres | drbdadm disconnect drbdres | ||
drbdadm connect --discard-my-data drbdres | drbdadm connect --discard-my-data drbdres | ||
or it shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", in that case what might work on primary node: | |||
drbdadm disconnect drbdX | drbdadm disconnect drbdX | ||
drbdadm connect drbdX | drbdadm connect drbdX | ||
==Split brain issue== | == Split brain issue == | ||
To force updating resource | To force updating resource | ||
#You might need this | |||
#You might need this | |||
drbdadm down <resource> | drbdadm down <resource> | ||
This works on a device that's not shown in /proc/drbd | This works on a device that's not shown in /proc/drbd | ||
drbdadm invalidate <resource> | drbdadm invalidate <resource> | ||
==cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown== | == cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown == | ||
connection is made, waiting for more | connection is made, waiting for more | ||
==Unexpected data packet AuthChallenge (0x0010)== | == cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown == | ||
Assuming you really are looking at the secondary: | |||
drbdadm -- --discard-my-data connect test3 | |||
and on primary you might need to | |||
drbdadm connect test3 | |||
or | |||
drbdadm primary --force test3 | |||
| |||
== Unexpected data packet AuthChallenge (0x0010) == | |||
maybe the shared key | maybe the shared key | ||
==State change failed: Device is held open by someone== | == State change failed: Device is held open by someone == | ||
could be stacked resource, timeout? | could be stacked resource, timeout? | ||
==error receiving ReportState, e: -5 l: 0!== | == error receiving ReportState, e: -5 l: 0! == | ||
?? | ?? | ||
==drbd: error sending genl reply== | == resource XYX cannot run anwhere == | ||
CentOS feature, https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003. | If you see that in log, it probably means it's a disabled resource in a disabled group | ||
"Try another kernel/module version" | CHECK THIS | ||
== drbd: error sending genl reply == | |||
CentOS feature, [https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003 https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003]. "Try another kernel/module version" | |||
== State change failed: (-14) Need a verify algorithm to start online verify == | |||
Means no verify-alg was defined, so no online checking | Means no verify-alg was defined, so no online checking | ||
==drbdadm dump-md foo: Found meta data is "unclean", please apply-al first== | == drbdadm dump-md foo: Found meta data is "unclean", please apply-al first == | ||
Try | Try | ||
drbdadm apply-al foo | drbdadm apply-al foo | ||
( AL means "activity log", btw ) | ( AL means "activity log", btw ) | ||
| |||
== log messages == | |||
=== sock_sendmsg time expired, ko=6 === | |||
latency problem? | |||
[[Category:Storage]] |
Latest revision as of 08:35, 18 September 2024
Distributed Replicated Block Device
Links
- Homepage
- DRBD-8.4 user's guide
- DRBD + pacemaker + NFS, pretty good doc
- http://www.securityandit.com/system/pacemaker-cluster-with-nfs-and-drbd/
- https://www.sebastien-han.fr/blog/2012/08/01/corosync-rrp-configuration/
- Building a redundant pair of Linux storage servers using DRBD and Heartbeat
- [https://serverstack.wordpress.com/2017/05/31/install-and-configure-drbd-cluster-on-rhel7-centos7/
- http://www.asplund.nu/xencluster/page2.html
- Resizing resources
- DRBD cheat sheet
- http://www.linux-ha.org/wiki/Main_Page
- HA iSCSI with drbd and pacemaker
- Internal meta data
- High Availability NFS With DRBD + Heartbeat
- Higly Available NFS on Ubuntu
- Configure sync rate
- https://serverfault.com/questions/740311/drbd-terrible-sync-performance-on-10gige
- also tuning tips
- Debian DRBD: How to resize NFS on drbd volume on top of LVM
- HOWTO: Resolve DRBD split-brain recovery manually
See also: http://www.gluster.org/ and http://ceph.com/
Stacked resources
- Using stacked DRBD resources in Pacemaker clusters
- https://github.com/fghaas/drbd-documentation/blob/master/users-guide/pacemaker.txt
- Pacemaker and DRBD9 stacked resources
- DRBD 8.3 Third Node Replication (stacked)
Support
Tools
- LCMC, a GUI for managing LVM, DRBD in pacemaker environment]
- zabbix monitoring for drbd
drbdadm
drbd-overview
drbdsetup
Docs
- Resizing drbd xen lvm (dont worry about the meta-data if that's not internal) (bug it seems wrong, using phy: instead of drbd: )
- http://www.asplund.nu/xencluster/page2.html
Recovery
- Recovering from a DRBD split-brain scenario in heartbeat
- http://www.asplund.nu/xencluster/page2.html
- [https://docs.linbit.com/doc/users-guide-83/s-split-brain-notification-and-recovery/ Split brain notification and automatic recovery
- Troubleshooting DRBD on MediaCentral
GFS on DRBD
- http://sourceware.org/cluster/wiki/DRBD_Cookbook
- http://www.piemontewireless.net/Storage_on_Cluster_DRBD_and_GFS2
Cheatsheet
HOWTO create a drbd resource
lvcreate -L2G -n test3 DRBD
Create resource file and verify it
drbdadm dump -c test3.res
Copy the resource file to /etc/drbd.d/ on both nodes On both nodes run
drbdadm create-md test3 cat /proc/drbd
Then:
drbdadm up test3
should give:
3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
On one node run:
drbdadm -- --overwrite-data-of-peer primary test3
or just
drbdadm primary --force test3
and check:
cat /proc/drbd
which should give:
3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500 [>...................] sync'ed: 7.5% (1945500/2097052)K
Make device/node primary
Should already be done by previous step
drbdadm primary yourdeviceID
or
drbdsetup /dev/drbdX primary -o
Create pcs resource
pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \ notify interval=0s timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ start interval=0s timeout=240 stop interval=0s timeout=100 --disabled
This will result in the error:
* TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.', last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms
Just run
pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 --disabled
and then
pcs resource cleanup TEST3_DRBD
Grow resource
On both nodes:
lvextend -L+10G /dev/DRBD/myresource
On one node:
drbdadm resize myresource
Check resource file
Editing files in /etc/drbd.d/ is a bad plan, to check syntax first:
drbdadm dump -c /tmp/test.res
Mapping resource name and device
ls -al /dev/drbd/<LVM volume group name>/by-disk/
ls -al /dev/drbd/by-res/
Renaming DRBD resource
Is a simple matter of renaming the oldresource.res file and updating its contents. Rember to first (on one node is enough):
drbdadm down myresource
and if you're like me you might need to rename your logical volume too
lvrename VG oldresource newresource
Remove drbd resource
drbdadm down myresource
remove resource files
Show statistics
drbdsetup status --verbose --statistics
FAQ
Get out of 'Standalone'
disconnect/connect until works :)
1: State change failed: (-2) Need access to UpToDate data
when you get that trying to make a node/resource primary, try
drbdadm primary drbdX --force
Failure: (102) Local address(port) already in use.
When drbdadm shows
disk:Inconsistent
and or
replication:SyncTarget
it's already busy, check
cat /proc/drbd instead
calculate metadata size
https://serverfault.com/questions/433999/calculating-drbd-meta-size
Cs=`blockdev --getsz /dev/foo` Bs=`blockdev --getpbsz /dev/foo`
TODO finish this
'mydrbd' not defined in your config (for this host).
If drbdadm create-md throws this, 'this host' is the clue: it must match `hostname`
https://newbiedba.wordpress.com/2015/09/21/drbd-not-defined-in-your-config-for-this-host/
show resource sizes
lsblk
commands to show info
drbdmon drbdtop
resolving split brain issues
- https://docs.linbit.com/doc/users-guide-83/s-resolve-split-brain/
- https://www.sebastien-han.fr/blog/2012/04/25/DRBD-split-brain/
diskless
You might try
drbdadm attach drbd0
The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount. Falling back to read-only mount because the NTFS partition is in an unsafe state. Please resume and shutdown Windows fully (no hibernation or fast restarting.)
When trying to mount a snapshot (kpartx -av backup-snap1 etc)
???
sync is slow
On secondary:
drbdadm disk-options --c-plan-ahead=0 --resync-rate=50M drbd0
and to reset after sync:
drbdadm adjust drbd0
show configuration
drbdsetup show
show more info
drbdsetup show-gi <minor-number>
Minor number is shown in drbdsetup show
update network settings
drbdsetup net-options 10.0.0.1 10.0.0.2 --sndbuf-size=2M
mount: unknown filesystem type 'drbd'
Usually means your node is not primary. If you're sure you know what you're doing you can use
mount -t ext4 /dev/drbd1 /drbdmount
or when you want to mount the partion while drbd is down:
kpartx -av /dev/mapper/DRBD-test1 #add map DRBD-test1p1 (253:5): 0 10482928 linear /dev/mapper/DRBD-test1 2048 #I suggest mounting ro mount -o ro /dev/mapper/DRBD-test1o1 /mnt/test1
reload configuration
drbdadm --dry-run adjust <resourcename|all>
and then
drbdadm adjust <resourcename|all>
show configuration of resource
drbdsetup /dev/drbd0 show
resource unknown
First try
drbdadm up resourcename
Command 'drbdmeta 1 v08 /dev/drbd0 internal apply-al' terminated with exit code 20
Most likely split brain issue, check dmesg etc
wfconnection standalone
Could be split brain situation
drbdadm -- --discard-my-data connect resource
Both nodes in WFConnection
Most like network connection problem
cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown
So you're on the primary node, secondary might showing nothing, then first
drbdadm up drbdX
or if secondary shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", it is waiting for connection(no?)
drbdadm disconnect drbdres drbdadm connect --discard-my-data drbdres
or it shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", in that case what might work on primary node:
drbdadm disconnect drbdX drbdadm connect drbdX
Split brain issue
To force updating resource
- You might need this
drbdadm down <resource>
This works on a device that's not shown in /proc/drbd
drbdadm invalidate <resource>
cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown
connection is made, waiting for more
cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown
Assuming you really are looking at the secondary:
drbdadm -- --discard-my-data connect test3
and on primary you might need to
drbdadm connect test3
or
drbdadm primary --force test3
Unexpected data packet AuthChallenge (0x0010)
maybe the shared key
State change failed: Device is held open by someone
could be stacked resource, timeout?
error receiving ReportState, e: -5 l: 0!
??
resource XYX cannot run anwhere
If you see that in log, it probably means it's a disabled resource in a disabled group CHECK THIS
drbd: error sending genl reply
CentOS feature, https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003. "Try another kernel/module version"
State change failed: (-14) Need a verify algorithm to start online verify
Means no verify-alg was defined, so no online checking
drbdadm dump-md foo: Found meta data is "unclean", please apply-al first
Try
drbdadm apply-al foo
( AL means "activity log", btw )
log messages
sock_sendmsg time expired, ko=6
latency problem?