DRBD
Distributed Replicated Block Device
Links
- Homepage
- DRBD + pacemaker + NFS, pretty good doc
- http://www.securityandit.com/system/pacemaker-cluster-with-nfs-and-drbd/
- https://www.sebastien-han.fr/blog/2012/08/01/corosync-rrp-configuration/
- Building a redundant pair of Linux storage servers using DRBD and Heartbeat
- [https://serverstack.wordpress.com/2017/05/31/install-and-configure-drbd-cluster-on-rhel7-centos7/
- http://www.asplund.nu/xencluster/page2.html
- Resizing resources
- DRBD cheat sheet
- http://www.linux-ha.org/wiki/Main_Page
- HA iSCSI with drbd and pacemaker
- Internal meta data
- Configure sync rate
- https://serverfault.com/questions/740311/drbd-terrible-sync-performance-on-10gige
- also tuning tips
- Debian DRBD: How to resize NFS on drbd volume on top of LVM
See also: http://www.gluster.org/ and http://ceph.com/
Stacked resources
- Using stacked DRBD resources in Pacemaker clusters
- https://github.com/fghaas/drbd-documentation/blob/master/users-guide/pacemaker.txt
- Pacemaker and DRBD9 stacked resources
- DRBD 8.3 Third Node Replication (stacked)
Support
Tools
- LCMC, a GUI for managing LVM, DRBD in pacemaker environment]
- zabbix monitoring for drbd
drbdadm
drbd-overview
drbdsetup
Docs
- Resizing drbd xen lvm (dont worry about the meta-data if that's not internal) (bug it seems wrong, using phy: instead of drbd: )
- http://www.asplund.nu/xencluster/page2.html
Recovery
- Recovering from a DRBD split-brain scenario in heartbeat
- http://www.asplund.nu/xencluster/page2.html
- [https://docs.linbit.com/doc/users-guide-83/s-split-brain-notification-and-recovery/ Split brain notification and automatic recovery
- Troubleshooting DRBD on MediaCentral
Adding a drbd resource in pacemaker managed cluster
https://www.linbit.com/drbd-user-guide/users-guide-drbd-8-4/ : "if your LVM volume group is managed by Pacemaker as explained in Highly available LVM with Pacemaker, it is imperative to place the cluster in maintenance mode prior to making changes to the DRBD configuration."
GFS on DRBD
- http://sourceware.org/cluster/wiki/DRBD_Cookbook
- http://www.piemontewireless.net/Storage_on_Cluster_DRBD_and_GFS2
Cheatsheet
Make device primary
drbdadm primary yourdeviceID
or
drbdsetup /dev/drbdX primary -o
Grow resource
On both nodes:
lvextend -L+10G /dev/DRBD/myresource
On one node:
drbdadm resize myresource
Check resource file
Editing files in /etc/drbd.d/ is a bad plan, to check syntax first:
drbdadm dump -c /tmp/test.res
FAQ
Get out of 'Standalone'
disconnect/connect until works :)
1: State change failed: (-2) Need access to UpToDate data
when you get that tryinng to make a node/resource primary, try
drbdadm primary drbdX --force
calculate metadata size
https://serverfault.com/questions/433999/calculating-drbd-meta-size
Cs=`blockdev --getsz /dev/foo` Bs=`blockdev --getpbsz /dev/foo`
TODO finish this
'mydrbd' not defined in your config (for this host).
If drbdadm create-md throws this, 'this host' is the clue: it must match `hostname`
https://newbiedba.wordpress.com/2015/09/21/drbd-not-defined-in-your-config-for-this-host/
show resource sizes
lsblk
commands to show info
drbdmon drbdtop
resolving split brain issues
- https://docs.linbit.com/doc/users-guide-83/s-resolve-split-brain/
- https://www.sebastien-han.fr/blog/2012/04/25/DRBD-split-brain/
diskless
You might try
drbdadm attach drbd0
The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount. Falling back to read-only mount because the NTFS partition is in an unsafe state. Please resume and shutdown Windows fully (no hibernation or fast restarting.)
When trying to mount a snapshot (kpartx -av backup-snap1 etc)
???
sync is slow
On secondary:
drbdadm disk-options --c-plan-ahead=0 --resync-rate=50M drbd0and
and to reset after sync:
drbdadm adjust drbd0
show configuration
drbdsetup show
show more info
drbdsetup show-gi <minor-number>
Minor number is shown in drbdsetup show
update network settings
drbdsetup net-options 10.0.0.1 10.0.0.2 --sndbuf-size=2M
mount: unknown filesystem type 'drbd'
Usually means your node is not primary. If you're sure you know what you're doing you can use
mount -t ext4 /dev/drbd1 /drbdmount
or when you want to mount the partion while drbd is down:
kpartx -av /dev/mapper/DRBD-test1 #add map DRBD-test1p1 (253:5): 0 10482928 linear /dev/mapper/DRBD-test1 2048 #I suggest mounting ro mount -o ro /dev/mapper/DRBD-test1o1 /mnt/test1
reload configuration
drbdadm --dry-run adjust <resourcename|all>
and then
drbdadm adjust <resourcename|all>
show configuration of resource
drbdsetup /dev/drbd0 show
resource unknown
First try
drbdadm up resourcename
Command 'drbdmeta 1 v08 /dev/drbd0 internal apply-al' terminated with exit code 20
Most likely split brain issue, check dmesg etc
wfconnection
Could be split brain situation
drbdadm -- --discard-my-data connect resource
cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown
So you're on the primary node, secondary might showing nothing, then first
drbdadm up drbdX
or if secondary shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", it is waiting for connection(no?)
drbdadm disconnect drbdres drbdadm connect --discard-my-data drbdres
or it shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", in that case what might work on primary node:
drbdadm disconnect drbdX drbdadm connect drbdX
Split brain issue
To force updating resource
drbdadm invalidate resource
cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown
connection is made, waiting for more
Unexpected data packet AuthChallenge (0x0010)
maybe the shared key
State change failed: Device is held open by someone
could be stacked resource, timeout?
error receiving ReportState, e: -5 l: 0!
??
drbd: error sending genl reply
CentOS feature, https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003. "Try another kernel/module version"
State change failed: (-14) Need a verify algorithm to start online verify
Means no verify-alg was defined, so no online checking
drbdadm dump-md foo: Found meta data is "unclean", please apply-al first
Try
drbdadm apply-al foo
( AL means "activity log", btw )