DRBD: Difference between revisions
m (→FAQ) |
mNo edit summary |
||
Line 64: | Line 64: | ||
==HOWTO create a drbd resource== | = Cheatsheet = | ||
== HOWTO create a drbd resource == | |||
lvcreate -L2G -n test3 DRBD | lvcreate -L2G -n test3 DRBD | ||
Create resource file and verify it | Create resource file and verify it | ||
drbdadm dump -c test3.res | drbdadm dump -c test3.res | ||
Copy the resource file to /etc/drbd.d/ on both nodes On both nodes run | |||
drbdadm create-md test3 | drbdadm create-md test3 | ||
cat /proc/drbd | cat /proc/drbd | ||
should give: | should give: | ||
<pre> | <pre>3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- | ||
3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- | |||
</pre> | </pre> | ||
On one node run: | On one node run: | ||
drbdadm -- --overwrite-data-of-peer primary test3 | drbdadm -- --overwrite-data-of-peer primary test3 | ||
or just | or just | ||
drbdadm primary --force test3 | drbdadm primary --force test3 | ||
and check: | and check: | ||
cat /proc/drbd | cat /proc/drbd | ||
which should give: | which should give: | ||
<pre> | <pre>3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- | ||
3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- | |||
ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500 | ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500 | ||
[>...................] sync'ed: 7.5% (1945500/2097052)K | [>...................] sync'ed: 7.5% (1945500/2097052)K | ||
</pre> | </pre> | ||
==Make device/node primary== | == Make device/node primary == | ||
Should already be done by previous step | Should already be done by previous step | ||
drbdadm primary yourdeviceID | drbdadm primary yourdeviceID | ||
or | or | ||
drbdsetup /dev/drbdX primary -o | drbdsetup /dev/drbdX primary -o | ||
==Create pcs resource== | == Create pcs resource == | ||
pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \ | pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \ | ||
notify interval=0s timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ | notify interval=0s timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ | ||
start interval=0s timeout=240 stop interval=0s timeout=100 --disabled | start interval=0s timeout=240 stop interval=0s timeout=100 --disabled | ||
This will result in the error: | This will result in the error: | ||
* TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.', | * TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.', | ||
last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms | last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms | ||
Just run | Just run | ||
pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 --disabled | pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 --disabled | ||
and then | and then | ||
pcs resource cleanup TEST3_DRBD | pcs resource cleanup TEST3_DRBD | ||
| |||
== Grow resource == | |||
On both nodes: | On both nodes: | ||
lvextend -L+10G /dev/DRBD/myresource | lvextend -L+10G /dev/DRBD/myresource | ||
On one node: | On one node: | ||
drbdadm resize myresource | drbdadm resize myresource | ||
| |||
== Check resource file == | |||
Editing files in /etc/drbd.d/ is a bad plan, to check syntax first: | Editing files in /etc/drbd.d/ is a bad plan, to check syntax first: | ||
drbdadm dump -c /tmp/test.res | drbdadm dump -c /tmp/test.res | ||
| |||
== Mapping resource name and device == | |||
ls -al /dev/drbd/<LVM volume group name>/by-disk/ | ls -al /dev/drbd/<LVM volume group name>/by-disk/ | ||
== Renaming DRBD resource == | |||
Is a simple matter of renaming the myrource.res file and updating its contents. Rember to first | |||
drbdadm down myresource | |||
=FAQ= | =FAQ= |
Revision as of 14:25, 21 December 2020
Distributed Replicated Block Device
Links
- Homepage
- DRBD-8.4 user's guide
- DRBD + pacemaker + NFS, pretty good doc
- http://www.securityandit.com/system/pacemaker-cluster-with-nfs-and-drbd/
- https://www.sebastien-han.fr/blog/2012/08/01/corosync-rrp-configuration/
- Building a redundant pair of Linux storage servers using DRBD and Heartbeat
- [https://serverstack.wordpress.com/2017/05/31/install-and-configure-drbd-cluster-on-rhel7-centos7/
- http://www.asplund.nu/xencluster/page2.html
- Resizing resources
- DRBD cheat sheet
- http://www.linux-ha.org/wiki/Main_Page
- HA iSCSI with drbd and pacemaker
- Internal meta data
- Configure sync rate
- https://serverfault.com/questions/740311/drbd-terrible-sync-performance-on-10gige
- also tuning tips
- Debian DRBD: How to resize NFS on drbd volume on top of LVM
- HOWTO: Resolve DRBD split-brain recovery manually
See also: http://www.gluster.org/ and http://ceph.com/
Stacked resources
- Using stacked DRBD resources in Pacemaker clusters
- https://github.com/fghaas/drbd-documentation/blob/master/users-guide/pacemaker.txt
- Pacemaker and DRBD9 stacked resources
- DRBD 8.3 Third Node Replication (stacked)
Support
Tools
- LCMC, a GUI for managing LVM, DRBD in pacemaker environment]
- zabbix monitoring for drbd
drbdadm
drbd-overview
drbdsetup
Docs
- Resizing drbd xen lvm (dont worry about the meta-data if that's not internal) (bug it seems wrong, using phy: instead of drbd: )
- http://www.asplund.nu/xencluster/page2.html
Recovery
- Recovering from a DRBD split-brain scenario in heartbeat
- http://www.asplund.nu/xencluster/page2.html
- [https://docs.linbit.com/doc/users-guide-83/s-split-brain-notification-and-recovery/ Split brain notification and automatic recovery
- Troubleshooting DRBD on MediaCentral
GFS on DRBD
- http://sourceware.org/cluster/wiki/DRBD_Cookbook
- http://www.piemontewireless.net/Storage_on_Cluster_DRBD_and_GFS2
Cheatsheet
HOWTO create a drbd resource
lvcreate -L2G -n test3 DRBD
Create resource file and verify it
drbdadm dump -c test3.res
Copy the resource file to /etc/drbd.d/ on both nodes On both nodes run
drbdadm create-md test3 cat /proc/drbd
should give:
3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
On one node run:
drbdadm -- --overwrite-data-of-peer primary test3
or just
drbdadm primary --force test3
and check:
cat /proc/drbd
which should give:
3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500 [>...................] sync'ed: 7.5% (1945500/2097052)K
Make device/node primary
Should already be done by previous step
drbdadm primary yourdeviceID
or
drbdsetup /dev/drbdX primary -o
Create pcs resource
pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \ notify interval=0s timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ start interval=0s timeout=240 stop interval=0s timeout=100 --disabled
This will result in the error:
* TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.', last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms
Just run
pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 --disabled
and then
pcs resource cleanup TEST3_DRBD
Grow resource
On both nodes:
lvextend -L+10G /dev/DRBD/myresource
On one node:
drbdadm resize myresource
Check resource file
Editing files in /etc/drbd.d/ is a bad plan, to check syntax first:
drbdadm dump -c /tmp/test.res
Mapping resource name and device
ls -al /dev/drbd/<LVM volume group name>/by-disk/
Renaming DRBD resource
Is a simple matter of renaming the myrource.res file and updating its contents. Rember to first
drbdadm down myresource
FAQ
Get out of 'Standalone'
disconnect/connect until works :)
1: State change failed: (-2) Need access to UpToDate data
when you get that tryinng to make a node/resource primary, try
drbdadm primary drbdX --force
calculate metadata size
https://serverfault.com/questions/433999/calculating-drbd-meta-size
Cs=`blockdev --getsz /dev/foo` Bs=`blockdev --getpbsz /dev/foo`
TODO finish this
'mydrbd' not defined in your config (for this host).
If drbdadm create-md throws this, 'this host' is the clue: it must match `hostname`
https://newbiedba.wordpress.com/2015/09/21/drbd-not-defined-in-your-config-for-this-host/
show resource sizes
lsblk
commands to show info
drbdmon drbdtop
resolving split brain issues
- https://docs.linbit.com/doc/users-guide-83/s-resolve-split-brain/
- https://www.sebastien-han.fr/blog/2012/04/25/DRBD-split-brain/
diskless
You might try
drbdadm attach drbd0
The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount. Falling back to read-only mount because the NTFS partition is in an unsafe state. Please resume and shutdown Windows fully (no hibernation or fast restarting.)
When trying to mount a snapshot (kpartx -av backup-snap1 etc)
???
sync is slow
On secondary:
drbdadm disk-options --c-plan-ahead=0 --resync-rate=50M drbd0and
and to reset after sync:
drbdadm adjust drbd0
show configuration
drbdsetup show
show more info
drbdsetup show-gi <minor-number>
Minor number is shown in drbdsetup show
update network settings
drbdsetup net-options 10.0.0.1 10.0.0.2 --sndbuf-size=2M
mount: unknown filesystem type 'drbd'
Usually means your node is not primary. If you're sure you know what you're doing you can use
mount -t ext4 /dev/drbd1 /drbdmount
or when you want to mount the partion while drbd is down:
kpartx -av /dev/mapper/DRBD-test1 #add map DRBD-test1p1 (253:5): 0 10482928 linear /dev/mapper/DRBD-test1 2048 #I suggest mounting ro mount -o ro /dev/mapper/DRBD-test1o1 /mnt/test1
reload configuration
drbdadm --dry-run adjust <resourcename|all>
and then
drbdadm adjust <resourcename|all>
show configuration of resource
drbdsetup /dev/drbd0 show
resource unknown
First try
drbdadm up resourcename
Command 'drbdmeta 1 v08 /dev/drbd0 internal apply-al' terminated with exit code 20
Most likely split brain issue, check dmesg etc
wfconnection
Could be split brain situation
drbdadm -- --discard-my-data connect resource
cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown
So you're on the primary node, secondary might showing nothing, then first
drbdadm up drbdX
or if secondary shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", it is waiting for connection(no?)
drbdadm disconnect drbdres drbdadm connect --discard-my-data drbdres
or it shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", in that case what might work on primary node:
drbdadm disconnect drbdX drbdadm connect drbdX
Split brain issue
To force updating resource
- You might need this
drbdadm down <resource>
This works on a device that's not shown in /proc/drbd
drbdadm invalidate <resource>
cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown
connection is made, waiting for more
cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown
Assuming you really are looking at the secondary:
drbdadm -- --discard-my-data connect test3
and on primary you might need to
drbdadm connect test3
or
drbdadm primary --force test3
Unexpected data packet AuthChallenge (0x0010)
maybe the shared key
State change failed: Device is held open by someone
could be stacked resource, timeout?
error receiving ReportState, e: -5 l: 0!
??
drbd: error sending genl reply
CentOS feature, https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003. "Try another kernel/module version"
State change failed: (-14) Need a verify algorithm to start online verify
Means no verify-alg was defined, so no online checking
drbdadm dump-md foo: Found meta data is "unclean", please apply-al first
Try
drbdadm apply-al foo
( AL means "activity log", btw )