Difference between revisions of "DRBD"

From DWIKI
⧼monobook-jumptonavigation⧽⧼monobook-jumptosearch⧽
(One intermediate revision by the same user not shown)
Line 83: Line 83:
 
On one node run:
 
On one node run:
 
  drbdadm -- --overwrite-data-of-peer primary test3
 
  drbdadm -- --overwrite-data-of-peer primary test3
 +
or just
 +
drbdadm primary --force test3
 +
and check:
 
  cat /proc/drbd
 
  cat /proc/drbd
should give:
+
which should give:
 
<pre>
 
<pre>
 
3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
 
3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
Line 91: Line 94:
 
</pre>
 
</pre>
  
==Make device primary==
+
==Make device/node primary==
 +
Should already be done by previous step
 +
 
 
  drbdadm primary yourdeviceID
 
  drbdadm primary yourdeviceID
 
or
 
or
 
  drbdsetup /dev/drbdX primary -o
 
  drbdsetup /dev/drbdX primary -o
 
 
  
 
==Create pcs resource==
 
==Create pcs resource==

Revision as of 10:19, 27 July 2020

Distributed Replicated Block Device


Links

See also: http://www.gluster.org/ and http://ceph.com/

Stacked resources

Support

Tools

drbdadm

drbd-overview

drbdsetup

Docs

Recovery



GFS on DRBD


Cheatsheet

HOWTO create a drbd resource

lvcreate -L2G -n test3 DRBD

Create resource file and verify it

drbdadm dump -c test3.res


Copy the resource file to /etc/drbd.d/ on both nodes On both nodes run

drbdadm create-md test3
cat /proc/drbd

should give:

3: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

On one node run:

drbdadm -- --overwrite-data-of-peer primary test3

or just

drbdadm primary --force test3

and check:

cat /proc/drbd

which should give:

3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:152168 nr:0 dw:0 dr:154288 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1945500
        [>...................] sync'ed:  7.5% (1945500/2097052)K

Make device/node primary

Should already be done by previous step

drbdadm primary yourdeviceID

or

drbdsetup /dev/drbdX primary -o

Create pcs resource

pcs resource create TEST3_DRBD ocf:linbit:drbd drbd_resource=test3 op demote interval=0s timeout=90 monitor interval=60s \
  notify interval=0s   timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
  start interval=0s timeout=240 stop interval=0s timeout=100 --disabled

This will result in the error:

* TEST3_DRBD_monitor_0 on santest-b 'not configured' (6): call=922, status=complete, exitreason='meta parameter misconfigured, expected clone-max -le 2, but found unset.',
   last-rc-change='Tue Jul 21 10:06:20 2020', queued=0ms, exec=680ms

Just run

pcs resource master TEST3_DRBD-Clone TEST3_DRBD master-node-max=1 clone-max=2 notify=true   master-max=1 clone-node-max=1 --disabled

and then

pcs resource cleanup TEST3_DRBD


Grow resource

On both nodes:

lvextend -L+10G /dev/DRBD/myresource

On one node:

drbdadm resize myresource


Check resource file

Editing files in /etc/drbd.d/ is a bad plan, to check syntax first:

drbdadm dump -c /tmp/test.res


Mapping resource name and device

ls -al /dev/drbd/<LVM volume group name>/by-disk/

FAQ

Get out of 'Standalone'

disconnect/connect until works :)

1: State change failed: (-2) Need access to UpToDate data

when you get that tryinng to make a node/resource primary, try

drbdadm primary drbdX --force

calculate metadata size

https://serverfault.com/questions/433999/calculating-drbd-meta-size


Cs=`blockdev --getsz /dev/foo`
Bs=`blockdev --getpbsz /dev/foo`

TODO finish this

'mydrbd' not defined in your config (for this host).

If drbdadm create-md throws this, 'this host' is the clue: it must match `hostname`


https://newbiedba.wordpress.com/2015/09/21/drbd-not-defined-in-your-config-for-this-host/

show resource sizes

lsblk

commands to show info

drbdmon
drbdtop


resolving split brain issues

diskless

You might try

drbdadm attach drbd0


The disk contains an unclean file system (0, 0).

Metadata kept in Windows cache, refused to mount. Falling back to read-only mount because the NTFS partition is in an unsafe state. Please resume and shutdown Windows fully (no hibernation or fast restarting.)

When trying to mount a snapshot (kpartx -av backup-snap1 etc)

???


sync is slow


On secondary:

drbdadm disk-options --c-plan-ahead=0 --resync-rate=50M drbd0and 

and to reset after sync:

drbdadm adjust drbd0

show configuration

drbdsetup show


show more info

drbdsetup show-gi <minor-number>

Minor number is shown in drbdsetup show

update network settings

drbdsetup net-options 10.0.0.1 10.0.0.2  --sndbuf-size=2M


mount: unknown filesystem type 'drbd'

Usually means your node is not primary. If you're sure you know what you're doing you can use

mount -t ext4 /dev/drbd1 /drbdmount

or when you want to mount the partion while drbd is down:

kpartx -av /dev/mapper/DRBD-test1 
#add map DRBD-test1p1 (253:5): 0 10482928 linear /dev/mapper/DRBD-test1 2048
#I suggest mounting ro 
mount -o ro /dev/mapper/DRBD-test1o1 /mnt/test1

reload configuration

drbdadm --dry-run adjust <resourcename|all>

and then

drbdadm adjust <resourcename|all>

show configuration of resource

drbdsetup /dev/drbd0 show


resource unknown

First try

drbdadm up resourcename

Command 'drbdmeta 1 v08 /dev/drbd0 internal apply-al' terminated with exit code 20

Most likely split brain issue, check dmesg etc

wfconnection

Could be split brain situation

drbdadm -- --discard-my-data connect resource

cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown

So you're on the primary node, secondary might showing nothing, then first

drbdadm up drbdX

or if secondary shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", it is waiting for connection(no?)

drbdadm disconnect drbdres
drbdadm connect --discard-my-data drbdres


or it shows "cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown", in that case what might work on primary node:

drbdadm disconnect drbdX
drbdadm connect drbdX

Split brain issue

To force updating resource

  1. You might need this
drbdadm down <resource>

This works on a device that's not shown in /proc/drbd

drbdadm invalidate <resource>

cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown

connection is made, waiting for more

Unexpected data packet AuthChallenge (0x0010)

maybe the shared key

State change failed: Device is held open by someone

could be stacked resource, timeout?

error receiving ReportState, e: -5 l: 0!

??

drbd: error sending genl reply

CentOS feature, https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.2003. "Try another kernel/module version"

State change failed: (-14) Need a verify algorithm to start online verify

Means no verify-alg was defined, so no online checking

drbdadm dump-md foo: Found meta data is "unclean", please apply-al first

Try

drbdadm apply-al foo

( AL means "activity log", btw )