ZFS: Difference between revisions

From DWIKI
 
(74 intermediate revisions by the same user not shown)
Line 4: Line 4:
*[http://www.edplese.com/samba-with-zfs.html http://www.edplese.com/samba-with-zfs.html]  
*[http://www.edplese.com/samba-with-zfs.html http://www.edplese.com/samba-with-zfs.html]  
*[http://wintelguy.com/zfs-calc.pl ZFS calculator]  
*[http://wintelguy.com/zfs-calc.pl ZFS calculator]  
*[https://www.raidz-calculator.com/default.aspx another zfs calculator]
*[https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/ ZFS clustering]  
*[https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/ ZFS clustering]  
*[https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/] ZFS and ECC]  
*[https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/] ZFS and ECC]  
Line 11: Line 12:
*[https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ Raidz expansion]
*[https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ Raidz expansion]
*[https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/ Basic guide to working with zfs]
*[https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/ Basic guide to working with zfs]
 
*[https://wiki.archlinux.org/title/ZFS Archlinux page on ZFS]
*[https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAIDZ.html Raidz basic concepts]


=Documentation=
=Documentation=
Line 22: Line 24:
*[http://www.opensolaris.org/os/community/zfs/intro/ Opensolaris ZFS intro]
*[http://www.opensolaris.org/os/community/zfs/intro/ Opensolaris ZFS intro]
*[http://www.raidz-calculator.com/raidz-types-reference.aspx raidz types reference]
*[http://www.raidz-calculator.com/raidz-types-reference.aspx raidz types reference]
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSZpoolFragmentationMeaning ZFS fragmentation]
*[https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAIDZ.html raidz]
==ARC/Caching==
==ARC/Caching==
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO]
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO]
Line 28: Line 33:
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSUnderstandingARCHits Understanding ARC hits]
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSUnderstandingARCHits Understanding ARC hits]
*[https://www.45drives.com/community/articles/zfs-caching/ ZFS Caching]
*[https://www.45drives.com/community/articles/zfs-caching/ ZFS Caching]
*[https://zfs-discuss.opensolaris.narkive.com/D7v2YmjF/raidz-what-is-stored-in-parity What is stored in parity]
===L2ARC===
*[https://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html ZFS L2ARC]
*[https://klarasystems.com/articles/openzfs-all-about-l2arc/ OpenZFS: All about the cache vdev or L2ARC]
sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)'
and
egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats


==Tuning ZFS==
==Tuning ZFS==
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/index.html ZFS Performance and Tuning]
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/index.html ZFS Performance and Tuning]
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Using and tuning ARC]
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO]
*[https://www.high-availability.com/docs/ZFS-Tuning-Guide/ ZFS Tuning and Optimisation]
([https://forums.oracle.com/ords/apexds/post/part-10-monitoring-and-tuning-zfs-performance-4977 Monitoring and Tuning ZFS Performance]
 
==ARC statistics==
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning module parameters]
*[https://openzfs.github.io/openzfs-docs/man/master/4/zfs.4.html ZFS]
 
===ZFS module parameters===
/sys/module/zfs/parameters/
cat /proc/spl/kstat/zfs/arcstats
===data_size===
size of cached user data
 
===dnode_size===
 
===hdr_size===
size of L2ARC headers stored in main ARC
 
===metadata_size===
size of cached metadata
 
=Tools=
*[https://github.com/asomers/ztop ztop]
*[https://github.com/jimsalterjrs/ioztat iozstat]
*[https://cuddletech.com/2008/10/explore-your-zfs-adaptive-replacement-cache-arc/ arc_summary]
*[https://github.com/richardelling/zfs-linux-tools zfs-linux-tools] kstat-analyzer is rather helpful
 
 
==kstat-analyzer==
 
===prefetch hit rate is low, consider tuning prefetcher===
Check:
 
Supposed to leave that at 0:
cat /sys/module/zfs/parameters/zfs_vdev_cache_size
 
 
Code:
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK
 
Relevant links:
*https://www.truenas.com/community/threads/notes-on-zfs-prefetch.1076/
 
*https://www.phoronix.com/news/OpenZFS-Uncached-Prefetch
 
=Processes=
==arc_evict==
Evict buffers from list until we've removed the specified number of
bytes.  Move the removed buffers to the appropriate evict state.
If the recycle flag is set, then attempt to "recycle" a buffer:
- look for a buffer to evict that is `bytes' long.
- return the data block from this buffer rather than freeing it.
This flag is used by callers that are trying to make space for a
new buffer in a full arc cache.
 
 
This function makes a "best effort".  It skips over any buffers
it can't get a hash_lock on, and so may not catch all candidates.
It may also return without evicting as much space as requested.
 
==arc_prune==


=Commands=
=Commands=


==Getting arc statistics==
==Getting arc statistics==
arcstat
  arc_summary
  arc_summary
Tip, for details use
arc_summary -d
There is also
cat /proc/spl/kstat/zfs/arcstats
and
zfetchstat + kstat-analyzer from zfs-linux-tools


  cat /proc/spl/kstat/zfs/arcstats
===zil/slog statistics===
arc_summary -s zil
or
  cat /proc/spl/kstat/zfs/zil
or
zilstat
or
  zpool iostat -v
 
===l2arc statistics===
arc_summary -s l2arc
 
==Getting IO statistics==
zpool iostat -v 300


=Terms and acronyms=
=Terms and acronyms=
==vdev==
==vdev==
'''V'''irtual '''Dev'''ice.
'''V'''irtual '''Dev'''ice.
[https://wiki.archlinux.org/title/ZFS/Virtual_disks ZFS Virtual disks]


*[https://wiki.archlinux.org/title/ZFS/Virtual_disks ZFS Virtual disks]
==ARC==
'''A'''daptive '''R'''eplacement '''C'''ache
Portion of RAM used to cache data to speed up read performance


==L2ARC==
==L2ARC==
'''Level 2 Adaptive Replacement Cache'''
'''L'''evel '''2''' '''A'''daptive Replacement '''C'''ache'''
 
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM"


SSD cache
SSD cache
Line 60: Line 164:
==MRU==
==MRU==
Most Recently Used
Most Recently Used
==zvol==
kind of block device whose space is allocated from the pool, useful for iscsi targets
==Scrubbing==
Checking disks/data integrity
zpool status <poolname | grep scrub
and
zpool scrub <poolname>
probably taken care of by cron.
==SLOG==
See [ZIL]
==ZIL==
[https://constantin.glez.de/2010/07/20/solaris-zfs-synchronous-writes-and-zil-explained/ ZIL explained]
the space synchronous writes are logged before the confirmation is sent back to the client
==prefetch==
See /proc/spl/kstat/zfs/zfetchstats
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch]
*[https://svennd.be/tuning-of-zfs-module/ Tuning of the ZFS module]
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch]
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCStatsAndPrefetch Some basic ZFS ARC statistics and prefetching]
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSPrefetchStatsNotes Some notes on ZFS prefetch related stats]
*[http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ Activity of the ZFS ARC]


= HOWTO =
= HOWTO =
==Get sizes/reservations==
zfs get quota,reservation tank/vol1
==Caching==
===Add log/cache===
For l2arc mirrors make little sense, just add disks
zpool add rpool cache sdf
or maybe better
zpool add rpool cache /dev/disk/by-id/ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394
or simply
zpool add rpool cache ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394
===Add ZIL/SLOG write cache===
zpool add rpool log mirror sdk sdl
===Remove ZIl/SLOG mirrored cache===
zpool remove mypool mirror-4 sdn1 sdo1
==Getting statistics==
===Show cache activity===
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5
===zpool===
zpool iostat
====More statistics, every 5 seconds====
zpool -v iostat 5
===Flush linux caches===
echo 3 > /proc/sys/vm/drop_caches
===arc statistics===
===l2arc statistics===
===ZIL statistics===
cat /proc/spl/kstat/zfs/zil


==Create zfs filesystem==
==Create zfs filesystem==
Line 109: Line 281:


  zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ
  zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ
or just
zpool replace tank /dev/sdi
===If disk is shown as '''UNAVAIL'''===
zpool offline tank sdi


==Showing information about ZFS pools and datasets==
==Showing information about ZFS pools and datasets==
Line 118: Line 298:


===Show reservations on datasets===
===Show reservations on datasets===
  zfs list -o name,reservations
  zfs list -o name,reservation


==Swap on zfs==
==Swap on zfs==
https://askubuntu.com/questions/228149/zfs-partition-as-swap
https://askubuntu.com/questions/228149/zfs-partition-as-swap
zfs create pool/swap -V 4G -b 4K
mkswap -f /dev/pool/swap
swapon /dev/pool/swap
and remember fstab


==vdevs==
==vdevs==
Line 130: Line 314:
===invalid vdev specification===
===invalid vdev specification===
Probably means you need -f
Probably means you need -f
===show balance between vdevs===
zpool iostat -v 'pool' [interval in seconds]
orjust
zpool iostat -vc 'pool'


== Tuning arc settings ==
== Tuning arc settings ==
Line 141: Line 330:
  echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max
  echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max
and add to /etc/modprobe.d/zfs.conf
and add to /etc/modprobe.d/zfs.conf
  zfs zfs_arc_max=5368709120
  options zfs zfs_arc_max=5368709120


'''NOTE you might need to run (for example when running / on zfs)'''
update-initramfs -u -k all


maybe you need
and perhaps clear caches and reset counters:


  echo 3 > /proc/sys/vm/drop_caches
  echo 3 > /proc/sys/vm/drop_caches


===Tune zfs_arc_dnode_limit_percent===
===Tune zfs_arc_dnode_limit_percent===
Line 155: Line 344:
  echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent
  echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent


In /etc/modprobe.d/zfs.conf:
In /etc/modprobe.d/zfs.conf:  


  options zfs zfs_arc_dnode_limit_percent=20
  options zfs zfs_arc_dnode_limit_percent=20
===export iscsi===
https://linuxhint.com/share-zfs-volumes-via-iscsi/


= FAQ =
= FAQ =
==arc_summary==
===VDEV cache disabled, skipping section===
This is normal, vdev caching is considered bad in current code
==Arc metadata size exceeds maximum==
So '''arc_meta_used''' > '''arc_meta_limit'''
==increasing feed rate==


== show status and disks ==
== show status and disks ==
Line 183: Line 387:
==VDEV cache disabled, skipping section==
==VDEV cache disabled, skipping section==
Looks like you just don't have l2arc cache
Looks like you just don't have l2arc cache
==cannot export 'tank': pool is busy==
After checking stuff like nfs etc try:
zfs unshare -a
zfs umount -a -f
zpool export -f tank

Latest revision as of 12:47, 25 July 2024

Links

Documentation

ARC/Caching

L2ARC

sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)'

and

egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats

Tuning ZFS

(Monitoring and Tuning ZFS Performance

ARC statistics

ZFS module parameters

/sys/module/zfs/parameters/
cat /proc/spl/kstat/zfs/arcstats

data_size

size of cached user data

dnode_size

hdr_size

size of L2ARC headers stored in main ARC

metadata_size

size of cached metadata

Tools


kstat-analyzer

prefetch hit rate is low, consider tuning prefetcher

Check:

Supposed to leave that at 0:

cat /sys/module/zfs/parameters/zfs_vdev_cache_size


Code:

if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK

Relevant links:

Processes

arc_evict

Evict buffers from list until we've removed the specified number of bytes. Move the removed buffers to the appropriate evict state. If the recycle flag is set, then attempt to "recycle" a buffer: - look for a buffer to evict that is `bytes' long. - return the data block from this buffer rather than freeing it. This flag is used by callers that are trying to make space for a new buffer in a full arc cache.


This function makes a "best effort". It skips over any buffers it can't get a hash_lock on, and so may not catch all candidates. It may also return without evicting as much space as requested.

arc_prune

Commands

Getting arc statistics

arcstat
arc_summary

Tip, for details use

arc_summary -d

There is also

cat /proc/spl/kstat/zfs/arcstats

and

zfetchstat + kstat-analyzer from zfs-linux-tools


zil/slog statistics

arc_summary -s zil

or

cat /proc/spl/kstat/zfs/zil

or

zilstat

or

 zpool iostat -v

l2arc statistics

arc_summary -s l2arc

Getting IO statistics

zpool iostat -v 300

Terms and acronyms

vdev

Virtual Device.

ARC

Adaptive Replacement Cache

Portion of RAM used to cache data to speed up read performance

L2ARC

Level 2 Adaptive Replacement Cache

"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM"

SSD cache

DMU

Data Management Unit


MFU

Most Frequently Used

MRU

Most Recently Used

zvol

kind of block device whose space is allocated from the pool, useful for iscsi targets

Scrubbing

Checking disks/data integrity

zpool status <poolname | grep scrub

and

zpool scrub <poolname>

probably taken care of by cron.


SLOG

See [ZIL]

ZIL

ZIL explained

the space synchronous writes are logged before the confirmation is sent back to the client

prefetch

See /proc/spl/kstat/zfs/zfetchstats

HOWTO

Get sizes/reservations

zfs get quota,reservation tank/vol1

Caching

Add log/cache

For l2arc mirrors make little sense, just add disks

zpool add rpool cache sdf

or maybe better

zpool add rpool cache /dev/disk/by-id/ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394

or simply

zpool add rpool cache ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394

Add ZIL/SLOG write cache

zpool add rpool log mirror sdk sdl

Remove ZIl/SLOG mirrored cache

zpool remove mypool mirror-4 sdn1 sdo1

Getting statistics

Show cache activity

dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5

zpool

zpool iostat

More statistics, every 5 seconds

zpool -v iostat 5

Flush linux caches

echo 3 > /proc/sys/vm/drop_caches

arc statistics

l2arc statistics

ZIL statistics

cat /proc/spl/kstat/zfs/zil

Create zfs filesystem

zfs create poolname/fsname

this also creates mountpoint


Add vdev to pool

zpool add mypool raidz1 sdg sdh sdi

Replace disk in zfs

Some links

Get information first:

Name of disk

zpool status

Find uid of disk to replace

take it offline

zpool offline poolname ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5RLZC6V

Get the disk guid:

zdb

guid: 15233236897831806877

Get list of disk by id:

ls -al /dev/disk/by-id

Save the id, shutdown, replace disk, boot:

Find the new disk:

ls -al /dev/disk/by-id

Run replace command. The id is the guid of the old disk, name is of the new disk

zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ


or just

zpool replace tank /dev/sdi


If disk is shown as UNAVAIL

zpool offline tank sdi

Showing information about ZFS pools and datasets

Show pools with sizes

zpool list 

or

zpool list -H -o name,size


Show reservations on datasets

zfs list -o name,reservation

Swap on zfs

https://askubuntu.com/questions/228149/zfs-partition-as-swap

zfs create pool/swap -V 4G -b 4K
mkswap -f /dev/pool/swap
swapon /dev/pool/swap

and remember fstab

vdevs

multiple vdevs

Multiple vdevs in a zpool get striped. What about balance?

invalid vdev specification

Probably means you need -f

show balance between vdevs

zpool iostat -v 'pool' [interval in seconds]

orjust

zpool iostat -vc 'pool'

Tuning arc settings

See Tuning ZFS modules parameters

zfs_arc_max

Linux defaults to giving 50% of RAM to arc, this is when:

cat /sys/module/zfs/parameters/zfs_arc_max
0
grep c_max /proc/spl/kstat/zfs/arcstats

To change this:

echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max

and add to /etc/modprobe.d/zfs.conf

options zfs zfs_arc_max=5368709120

NOTE you might need to run (for example when running / on zfs)

update-initramfs -u -k all

and perhaps clear caches and reset counters:

echo 3 > /proc/sys/vm/drop_caches

Tune zfs_arc_dnode_limit_percent

Assuming zfs_arc_dnode_limit = 0:

echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent

In /etc/modprobe.d/zfs.conf:


options zfs zfs_arc_dnode_limit_percent=20


export iscsi

https://linuxhint.com/share-zfs-volumes-via-iscsi/

FAQ

arc_summary

VDEV cache disabled, skipping section

This is normal, vdev caching is considered bad in current code

Arc metadata size exceeds maximum

So arc_meta_used > arc_meta_limit


increasing feed rate

show status and disks

zpool status

show drives/pools

zfs list
      

check raid level

zfs list -a


Estimate raidz speeds

raidz1: N/(N-1) * IOPS
raidz2: N/(N-2) * IOPS
raidz3: N/(N-3) * IOPS


VDEV cache disabled, skipping section

Looks like you just don't have l2arc cache


cannot export 'tank': pool is busy

After checking stuff like nfs etc try:

zfs unshare -a
zfs umount -a -f
zpool export -f tank