ZFS: Difference between revisions
m (→Swap on zfs) |
|||
(51 intermediate revisions by the same user not shown) | |||
Line 12: | Line 12: | ||
*[https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ Raidz expansion] | *[https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ Raidz expansion] | ||
*[https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/ Basic guide to working with zfs] | *[https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/ Basic guide to working with zfs] | ||
*[https://wiki.archlinux.org/title/ZFS Archlinux page on ZFS] | |||
*[https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAIDZ.html Raidz basic concepts] | |||
=Documentation= | =Documentation= | ||
Line 22: | Line 24: | ||
*[http://www.opensolaris.org/os/community/zfs/intro/ Opensolaris ZFS intro] | *[http://www.opensolaris.org/os/community/zfs/intro/ Opensolaris ZFS intro] | ||
*[http://www.raidz-calculator.com/raidz-types-reference.aspx raidz types reference] | *[http://www.raidz-calculator.com/raidz-types-reference.aspx raidz types reference] | ||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSZpoolFragmentationMeaning ZFS fragmentation] | |||
*[https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAIDZ.html raidz] | |||
==ARC/Caching== | ==ARC/Caching== | ||
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | *[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | ||
Line 28: | Line 33: | ||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSUnderstandingARCHits Understanding ARC hits] | *[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSUnderstandingARCHits Understanding ARC hits] | ||
*[https://www.45drives.com/community/articles/zfs-caching/ ZFS Caching] | *[https://www.45drives.com/community/articles/zfs-caching/ ZFS Caching] | ||
*[https:// | *[https://zfs-discuss.opensolaris.narkive.com/D7v2YmjF/raidz-what-is-stored-in-parity What is stored in parity] | ||
===L2ARC=== | |||
*[https://klarasystems.com/articles/openzfs-all-about-l2arc/ OpenZFS: All about the cache vdev or L2ARC] | |||
sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)' | |||
and | |||
egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats | |||
==Tuning ZFS== | |||
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/index.html ZFS Performance and Tuning] | |||
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | |||
*[https://www.high-availability.com/docs/ZFS-Tuning-Guide/ ZFS Tuning and Optimisation] | |||
([https://forums.oracle.com/ords/apexds/post/part-10-monitoring-and-tuning-zfs-performance-4977 Monitoring and Tuning ZFS Performance] | |||
==ARC statistics== | ==ARC statistics== | ||
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning module parameters] | *[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning module parameters] | ||
*[https://openzfs.github.io/openzfs-docs/man/master/4/zfs.4.html ZFS] | |||
===ZFS module parameters=== | |||
/sys/module/zfs/parameters/ | |||
cat /proc/spl/kstat/zfs/arcstats | cat /proc/spl/kstat/zfs/arcstats | ||
===data_size=== | ===data_size=== | ||
Line 45: | Line 66: | ||
size of cached metadata | size of cached metadata | ||
= | =Tools= | ||
*[https:// | *[https://github.com/asomers/ztop ztop] | ||
*[https:// | *[https://github.com/jimsalterjrs/ioztat iozstat] | ||
*[https://cuddletech.com/2008/10/explore-your-zfs-adaptive-replacement-cache-arc/ arc_summary] | |||
*[https://github.com/richardelling/zfs-linux-tools zfs-linux-tools] kstat-analyzer is rather helpful | |||
==kstat-analyzer== | |||
===prefetch hit rate is low, consider tuning prefetcher=== | |||
Check: | |||
Supposed to leave that at 0: | |||
cat /sys/module/zfs/parameters/zfs_vdev_cache_size | |||
Code: | |||
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK | |||
Relevant links: | |||
*https://www.truenas.com/community/threads/notes-on-zfs-prefetch.1076/ | |||
*https://www.phoronix.com/news/OpenZFS-Uncached-Prefetch | |||
=Processes= | =Processes= | ||
Line 69: | Line 110: | ||
==Getting arc statistics== | ==Getting arc statistics== | ||
arcstat | |||
arc_summary | arc_summary | ||
Tip, for details use | Tip, for details use | ||
Line 75: | Line 118: | ||
cat /proc/spl/kstat/zfs/arcstats | cat /proc/spl/kstat/zfs/arcstats | ||
and | |||
zfetchstat + kstat-analyzer from zfs-linux-tools | |||
===zil/slog statistics=== | |||
arc_summary -s zil | |||
===l2arc statistics=== | |||
arc_summary -s l2arc | |||
==Getting IO statistics== | ==Getting IO statistics== | ||
Line 106: | Line 158: | ||
Most Recently Used | Most Recently Used | ||
==zvol== | |||
kind of block device whose space is allocated from the pool, useful for iscsi targets | |||
==Scrubbing== | ==Scrubbing== | ||
Line 114: | Line 168: | ||
zpool scrub <poolname> | zpool scrub <poolname> | ||
probably taken care of by cron. | probably taken care of by cron. | ||
==SLOG== | |||
See [ZIL] | |||
==ZIL== | ==ZIL== | ||
[https://constantin.glez.de/2010/07/20/solaris-zfs-synchronous-writes-and-zil-explained/ ZIL explained] | |||
the space synchronous writes are logged before the confirmation is sent back to the client | the space synchronous writes are logged before the confirmation is sent back to the client | ||
==prefetch== | ==prefetch== | ||
See /proc/spl/kstat/zfs/zfetchstats | |||
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch] | |||
*[https://svennd.be/tuning-of-zfs-module/ Tuning of the ZFS module] | *[https://svennd.be/tuning-of-zfs-module/ Tuning of the ZFS module] | ||
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch] | *[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch] | ||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCStatsAndPrefetch Some basic ZFS ARC statistics and prefetching] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSPrefetchStatsNotes Some notes on ZFS prefetch related stats] | |||
*[http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ Activity of the ZFS ARC] | |||
= HOWTO = | = HOWTO = | ||
==Get sizes/reservations== | |||
zfs get quota,reservation tank/vol1 | |||
==Caching== | |||
===Add log/cache=== | |||
zpool add rpool cache sdf | |||
===Add ZIL/SLOG write cache=== | |||
zpool add rpool log mirror sdk sdl | |||
===Remove ZIl/SLOG mirrored cache=== | |||
zpool remove mypool mirror-4 sdn1 sdo1 | |||
==Getting statistics== | |||
===Show cache activity=== | |||
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5 | |||
===zpool=== | |||
zpool iostat | |||
====More statistics, every 5 seconds==== | |||
zpool -v iostat 5 | |||
===Flush linux caches=== | |||
echo 3 > /proc/sys/vm/drop_caches | |||
===arc statistics=== | |||
===l2arc statistics=== | |||
===ZIL statistics=== | |||
cat /proc/spl/kstat/zfs/zil | |||
==Create zfs filesystem== | ==Create zfs filesystem== | ||
Line 171: | Line 269: | ||
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | ||
or just | |||
zpool replace tank /dev/sdi | |||
===If disk is shown as '''UNAVAIL'''=== | |||
zpool offline tank sdi | |||
==Showing information about ZFS pools and datasets== | ==Showing information about ZFS pools and datasets== | ||
Line 180: | Line 286: | ||
===Show reservations on datasets=== | ===Show reservations on datasets=== | ||
zfs list -o name, | zfs list -o name,reservation | ||
==Swap on zfs== | ==Swap on zfs== | ||
https://askubuntu.com/questions/228149/zfs-partition-as-swap | https://askubuntu.com/questions/228149/zfs-partition-as-swap | ||
zfs create pool/swap -V 4G -b 4K | |||
mkswap -f /dev/pool/swap | |||
swapon /dev/pool/swap | |||
and remember fstab | |||
==vdevs== | ==vdevs== | ||
Line 208: | Line 318: | ||
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max | echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max | ||
and add to /etc/modprobe.d/zfs.conf | and add to /etc/modprobe.d/zfs.conf | ||
zfs zfs_arc_max=5368709120 | options zfs zfs_arc_max=5368709120 | ||
'''NOTE you might need to run''' | |||
update-initramfs -u | |||
and perhaps clear caches and reset counters: | |||
echo 3 > /proc/sys/vm/drop_caches | echo 3 > /proc/sys/vm/drop_caches | ||
===Tune zfs_arc_dnode_limit_percent=== | ===Tune zfs_arc_dnode_limit_percent=== | ||
Line 222: | Line 332: | ||
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent | echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent | ||
In /etc/modprobe.d/zfs.conf: | In /etc/modprobe.d/zfs.conf: | ||
options zfs zfs_arc_dnode_limit_percent=20 | options zfs zfs_arc_dnode_limit_percent=20 | ||
===export iscsi=== | |||
https://linuxhint.com/share-zfs-volumes-via-iscsi/ | |||
= FAQ = | = FAQ = | ||
==Arc metadata size exceeds maximum | ==arc_summary== | ||
===VDEV cache disabled, skipping section=== | |||
This is normal, vdev caching is considered bad | |||
==Arc metadata size exceeds maximum== | |||
So '''arc_meta_used''' > '''arc_meta_limit''' | So '''arc_meta_used''' > '''arc_meta_limit''' | ||
==increasing feed rate== | |||
Line 253: | Line 376: | ||
==VDEV cache disabled, skipping section== | ==VDEV cache disabled, skipping section== | ||
Looks like you just don't have l2arc cache | Looks like you just don't have l2arc cache | ||
==cannot export 'tank': pool is busy== | |||
After checking stuff like nfs etc try: | |||
zfs unshare -a | |||
zfs umount -a -f | |||
zpool export -f tank |
Revision as of 14:23, 15 April 2024
Links
- http://open-zfs.org
- http://www.edplese.com/samba-with-zfs.html
- ZFS calculator
- another zfs calculator
- ZFS clustering
- https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ ZFS and ECC]
- ZFS troubleshooting/disk replacement
- Creating a ZFS HA Cluster using shared or shared-nothing storage
- ZFS 101
- Raidz expansion
- Basic guide to working with zfs
- Archlinux page on ZFS
- Raidz basic concepts
Documentation
- zfs manpage
- ZFS on Linux
- openzfs wiki
- https://wiki.gentoo.org/wiki/ZFS
- ZFS cheatsheet
- http://wiki.freebsd.org/ZFSQuickStartGuide
- Opensolaris ZFS intro
- raidz types reference
- ZFS fragmentation
- raidz
ARC/Caching
- Configuring ZFS Cache for High-Speed IO
- ZFS Arc various sizes
- Activity of the ZFS ARC
- Understanding ARC hits
- ZFS Caching
- What is stored in parity
L2ARC
sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)'
and
egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats
Tuning ZFS
(Monitoring and Tuning ZFS Performance
ARC statistics
ZFS module parameters
/sys/module/zfs/parameters/ cat /proc/spl/kstat/zfs/arcstats
data_size
size of cached user data
dnode_size
hdr_size
size of L2ARC headers stored in main ARC
metadata_size
size of cached metadata
Tools
- ztop
- iozstat
- arc_summary
- zfs-linux-tools kstat-analyzer is rather helpful
kstat-analyzer
prefetch hit rate is low, consider tuning prefetcher
Check:
Supposed to leave that at 0:
cat /sys/module/zfs/parameters/zfs_vdev_cache_size
Code:
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK
Relevant links:
Processes
arc_evict
Evict buffers from list until we've removed the specified number of bytes. Move the removed buffers to the appropriate evict state. If the recycle flag is set, then attempt to "recycle" a buffer: - look for a buffer to evict that is `bytes' long. - return the data block from this buffer rather than freeing it. This flag is used by callers that are trying to make space for a new buffer in a full arc cache.
This function makes a "best effort". It skips over any buffers
it can't get a hash_lock on, and so may not catch all candidates.
It may also return without evicting as much space as requested.
arc_prune
Commands
Getting arc statistics
arcstat
arc_summary
Tip, for details use
arc_summary -d
There is also
cat /proc/spl/kstat/zfs/arcstats
and
zfetchstat + kstat-analyzer from zfs-linux-tools
zil/slog statistics
arc_summary -s zil
l2arc statistics
arc_summary -s l2arc
Getting IO statistics
zpool iostat -v 300
Terms and acronyms
vdev
Virtual Device.
ARC
Adaptive Replacement Cache
Portion of RAM used to cache data to speed up read performance
L2ARC
Level 2 Adaptive Replacement Cache
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM"
SSD cache
DMU
Data Management Unit
MFU
Most Frequently Used
MRU
Most Recently Used
zvol
kind of block device whose space is allocated from the pool, useful for iscsi targets
Scrubbing
Checking disks/data integrity
zpool status <poolname | grep scrub
and
zpool scrub <poolname>
probably taken care of by cron.
SLOG
See [ZIL]
ZIL
the space synchronous writes are logged before the confirmation is sent back to the client
prefetch
See /proc/spl/kstat/zfs/zfetchstats
- Understanding ZFS prefetch
- Tuning of the ZFS module
- Understanding ZFS prefetch
- Some basic ZFS ARC statistics and prefetching
- Some notes on ZFS prefetch related stats
- Activity of the ZFS ARC
HOWTO
Get sizes/reservations
zfs get quota,reservation tank/vol1
Caching
Add log/cache
zpool add rpool cache sdf
Add ZIL/SLOG write cache
zpool add rpool log mirror sdk sdl
Remove ZIl/SLOG mirrored cache
zpool remove mypool mirror-4 sdn1 sdo1
Getting statistics
Show cache activity
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5
zpool
zpool iostat
More statistics, every 5 seconds
zpool -v iostat 5
Flush linux caches
echo 3 > /proc/sys/vm/drop_caches
arc statistics
l2arc statistics
ZIL statistics
cat /proc/spl/kstat/zfs/zil
Create zfs filesystem
zfs create poolname/fsname
this also creates mountpoint
Add vdev to pool
zpool add mypool raidz1 sdg sdh sdi
Replace disk in zfs
Some links
Get information first:
Name of disk
zpool status
Find uid of disk to replace
take it offline
zpool offline poolname ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5RLZC6V
Get the disk guid:
zdb
guid: 15233236897831806877
Get list of disk by id:
ls -al /dev/disk/by-id
Save the id, shutdown, replace disk, boot:
Find the new disk:
ls -al /dev/disk/by-id
Run replace command. The id is the guid of the old disk, name is of the new disk
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ
or just
zpool replace tank /dev/sdi
If disk is shown as UNAVAIL
zpool offline tank sdi
Showing information about ZFS pools and datasets
Show pools with sizes
zpool list
or
zpool list -H -o name,size
Show reservations on datasets
zfs list -o name,reservation
Swap on zfs
https://askubuntu.com/questions/228149/zfs-partition-as-swap
zfs create pool/swap -V 4G -b 4K mkswap -f /dev/pool/swap swapon /dev/pool/swap
and remember fstab
vdevs
multiple vdevs
Multiple vdevs in a zpool get striped. What about balance?
invalid vdev specification
Probably means you need -f
show balance between vdevs
zpool iostat -v 'pool' [interval in seconds]
orjust
zpool iostat -vc 'pool'
Tuning arc settings
See Tuning ZFS modules parameters
zfs_arc_max
Linux defaults to giving 50% of RAM to arc, this is when:
cat /sys/module/zfs/parameters/zfs_arc_max 0 grep c_max /proc/spl/kstat/zfs/arcstats
To change this:
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max
and add to /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=5368709120
NOTE you might need to run
update-initramfs -u
and perhaps clear caches and reset counters:
echo 3 > /proc/sys/vm/drop_caches
Tune zfs_arc_dnode_limit_percent
Assuming zfs_arc_dnode_limit = 0:
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent
In /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_dnode_limit_percent=20
export iscsi
https://linuxhint.com/share-zfs-volumes-via-iscsi/
FAQ
arc_summary
VDEV cache disabled, skipping section
This is normal, vdev caching is considered bad
Arc metadata size exceeds maximum
So arc_meta_used > arc_meta_limit
increasing feed rate
show status and disks
zpool status
show drives/pools
zfs list
check raid level
zfs list -a
Estimate raidz speeds
raidz1: N/(N-1) * IOPS raidz2: N/(N-2) * IOPS raidz3: N/(N-3) * IOPS
VDEV cache disabled, skipping section
Looks like you just don't have l2arc cache
cannot export 'tank': pool is busy
After checking stuff like nfs etc try:
zfs unshare -a zfs umount -a -f zpool export -f tank