ZFS: Difference between revisions
mNo edit summary |
m (→Swap on zfs) |
||
(125 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
*http://wiki.freebsd.org/ZFSQuickStartGuide | |||
*http://www.opensolaris.org/os/community/zfs/intro/ | = Links = | ||
*http:// | *[http://open-zfs.org http://open-zfs.org] | ||
*[http://www.edplese.com/samba-with-zfs.html http://www.edplese.com/samba-with-zfs.html] | |||
*[http://wintelguy.com/zfs-calc.pl ZFS calculator] | |||
*[https://www.raidz-calculator.com/default.aspx another zfs calculator] | |||
*[https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/ ZFS clustering] | |||
*[https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/] ZFS and ECC] | |||
*[https://docs.joyent.com/private-cloud/troubleshooting/disk-replacement ZFS troubleshooting/disk replacement] | |||
*[https://www.high-availability.com/docs/Quickstart-ZFS-Cluster/ Creating a ZFS HA Cluster using shared or shared-nothing storage] | |||
*[https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/ ZFS 101] | |||
*[https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ Raidz expansion] | |||
*[https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/ Basic guide to working with zfs] | |||
*[https://wiki.archlinux.org/title/ZFS Archlinux page on ZFS] | |||
*[https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAIDZ.html Raidz basic concepts] | |||
=Documentation= | |||
*[https://openzfs.github.io/openzfs-docs/man/4/zfs.4.html zfs manpage] | |||
*[http://zfsonlinux.org/ ZFS on Linux] | |||
*[https://openzfs.org/wiki/ openzfs wiki] | |||
*[https://wiki.gentoo.org/wiki/ZFS https://wiki.gentoo.org/wiki/ZFS] | |||
*[https://blog.programster.org/zfs-cheatsheet ZFS cheatsheet] | |||
*[http://wiki.freebsd.org/ZFSQuickStartGuide http://wiki.freebsd.org/ZFSQuickStartGuide] | |||
*[http://www.opensolaris.org/os/community/zfs/intro/ Opensolaris ZFS intro] | |||
*[http://www.raidz-calculator.com/raidz-types-reference.aspx raidz types reference] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSZpoolFragmentationMeaning ZFS fragmentation] | |||
*[https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAIDZ.html raidz] | |||
==ARC/Caching== | |||
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCItsVariousSizes ZFS Arc various sizes] | |||
*[http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ Activity of the ZFS ARC] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSUnderstandingARCHits Understanding ARC hits] | |||
*[https://www.45drives.com/community/articles/zfs-caching/ ZFS Caching] | |||
*[https://zfs-discuss.opensolaris.narkive.com/D7v2YmjF/raidz-what-is-stored-in-parity What is stored in parity] | |||
===L2ARC=== | |||
*[https://klarasystems.com/articles/openzfs-all-about-l2arc/ OpenZFS: All about the cache vdev or L2ARC] | |||
sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)' | |||
and | |||
egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats | |||
==Tuning ZFS== | |||
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/index.html ZFS Performance and Tuning] | |||
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | |||
*[https://www.high-availability.com/docs/ZFS-Tuning-Guide/ ZFS Tuning and Optimisation] | |||
([https://forums.oracle.com/ords/apexds/post/part-10-monitoring-and-tuning-zfs-performance-4977 Monitoring and Tuning ZFS Performance] | |||
==ARC statistics== | |||
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning module parameters] | |||
*[https://openzfs.github.io/openzfs-docs/man/master/4/zfs.4.html ZFS] | |||
===ZFS module parameters=== | |||
/sys/module/zfs/parameters/ | |||
cat /proc/spl/kstat/zfs/arcstats | |||
===data_size=== | |||
size of cached user data | |||
===dnode_size=== | |||
===hdr_size=== | |||
size of L2ARC headers stored in main ARC | |||
===metadata_size=== | |||
size of cached metadata | |||
=Tools= | |||
*[https://github.com/asomers/ztop ztop] | |||
*[https://github.com/jimsalterjrs/ioztat iozstat] | |||
*[https://cuddletech.com/2008/10/explore-your-zfs-adaptive-replacement-cache-arc/ arc_summary] | |||
*[https://github.com/richardelling/zfs-linux-tools zfs-linux-tools] kstat-analyzer is rather helpful | |||
==kstat-analyzer== | |||
===prefetch hit rate is low, consider tuning prefetcher=== | |||
Check: | |||
Supposed to leave that at 0: | |||
cat /sys/module/zfs/parameters/zfs_vdev_cache_size | |||
Code: | |||
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK | |||
Relevant links: | |||
*https://www.truenas.com/community/threads/notes-on-zfs-prefetch.1076/ | |||
*https://www.phoronix.com/news/OpenZFS-Uncached-Prefetch | |||
=Processes= | |||
==arc_evict== | |||
Evict buffers from list until we've removed the specified number of | |||
bytes. Move the removed buffers to the appropriate evict state. | |||
If the recycle flag is set, then attempt to "recycle" a buffer: | |||
- look for a buffer to evict that is `bytes' long. | |||
- return the data block from this buffer rather than freeing it. | |||
This flag is used by callers that are trying to make space for a | |||
new buffer in a full arc cache. | |||
This function makes a "best effort". It skips over any buffers | |||
it can't get a hash_lock on, and so may not catch all candidates. | |||
It may also return without evicting as much space as requested. | |||
==arc_prune== | |||
=Commands= | |||
==Getting arc statistics== | |||
arcstat | |||
arc_summary | |||
Tip, for details use | |||
arc_summary -d | |||
There is also | |||
cat /proc/spl/kstat/zfs/arcstats | |||
and | |||
zfetchstat + kstat-analyzer from zfs-linux-tools | |||
===zil/slog statistics=== | |||
arc_summary -s zil | |||
===l2arc statistics=== | |||
arc_summary -s l2arc | |||
==Getting IO statistics== | |||
zpool iostat -v 300 | |||
=Terms and acronyms= | |||
==vdev== | |||
'''V'''irtual '''Dev'''ice. | |||
*[https://wiki.archlinux.org/title/ZFS/Virtual_disks ZFS Virtual disks] | |||
==ARC== | |||
'''A'''daptive '''R'''eplacement '''C'''ache | |||
Portion of RAM used to cache data to speed up read performance | |||
==L2ARC== | |||
'''L'''evel '''2''' '''A'''daptive Replacement '''C'''ache''' | |||
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM" | |||
SSD cache | |||
==DMU== | |||
Data Management Unit | |||
==MFU== | |||
Most Frequently Used | |||
==MRU== | |||
Most Recently Used | |||
==zvol== | |||
kind of block device whose space is allocated from the pool, useful for iscsi targets | |||
==Scrubbing== | |||
Checking disks/data integrity | |||
zpool status <poolname | grep scrub | |||
and | |||
zpool scrub <poolname> | |||
probably taken care of by cron. | |||
==SLOG== | |||
See [ZIL] | |||
==ZIL== | |||
[https://constantin.glez.de/2010/07/20/solaris-zfs-synchronous-writes-and-zil-explained/ ZIL explained] | |||
the space synchronous writes are logged before the confirmation is sent back to the client | |||
==prefetch== | |||
See /proc/spl/kstat/zfs/zfetchstats | |||
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch] | |||
*[https://svennd.be/tuning-of-zfs-module/ Tuning of the ZFS module] | |||
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCStatsAndPrefetch Some basic ZFS ARC statistics and prefetching] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSPrefetchStatsNotes Some notes on ZFS prefetch related stats] | |||
*[http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ Activity of the ZFS ARC] | |||
= HOWTO = | |||
==Get sizes/reservations== | |||
zfs get quota,reservation tank/vol1 | |||
==Caching== | |||
===Add log/cache=== | |||
zpool add rpool cache sdf | |||
===Add ZIL/SLOG write cache=== | |||
zpool add rpool log mirror sdk sdl | |||
===Remove ZIl/SLOG mirrored cache=== | |||
zpool remove mypool mirror-4 sdn1 sdo1 | |||
==Getting statistics== | |||
===Show cache activity=== | |||
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5 | |||
===zpool=== | |||
zpool iostat | |||
====More statistics, every 5 seconds==== | |||
zpool -v iostat 5 | |||
===Flush linux caches=== | |||
echo 3 > /proc/sys/vm/drop_caches | |||
===arc statistics=== | |||
===l2arc statistics=== | |||
===ZIL statistics=== | |||
cat /proc/spl/kstat/zfs/zil | |||
==Create zfs filesystem== | |||
zfs create poolname/fsname | |||
this also creates mountpoint | |||
==Add vdev to pool== | |||
zpool add mypool raidz1 sdg sdh sdi | |||
== Replace disk in zfs == | |||
=== Some links === | |||
*[https://itectec.com/ubuntu/ubuntu-replacing-a-dead-disk-in-a-zpool/ https://itectec.com/ubuntu/ubuntu-replacing-a-dead-disk-in-a-zpool/] | |||
Get information first: | |||
Name of disk | |||
zpool status | |||
Find uid of disk to replace | |||
take it offline | |||
zpool offline poolname ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5RLZC6V | |||
Get the disk guid: | |||
zdb | |||
guid: 15233236897831806877 | |||
Get list of disk by id: | |||
ls -al /dev/disk/by-id | |||
Save the id, shutdown, replace disk, boot: | |||
Find the new disk: | |||
ls -al /dev/disk/by-id | |||
Run replace command. The id is the guid of the old disk, name is of the new disk | |||
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | |||
or just | |||
zpool replace tank /dev/sdi | |||
===If disk is shown as '''UNAVAIL'''=== | |||
zpool offline tank sdi | |||
==Showing information about ZFS pools and datasets== | |||
===Show pools with sizes=== | |||
zpool list | |||
or | |||
zpool list -H -o name,size | |||
===Show reservations on datasets=== | |||
zfs list -o name,reservation | |||
==Swap on zfs== | |||
https://askubuntu.com/questions/228149/zfs-partition-as-swap | |||
zfs create pool/swap -V 4G -b 4K | |||
mkswap -f /dev/pool/swap | |||
swapon /dev/pool/swap | |||
and remember fstab | |||
==vdevs== | |||
===multiple vdevs=== | |||
Multiple vdevs in a zpool get striped. | |||
What about balance? | |||
===invalid vdev specification=== | |||
Probably means you need -f | |||
===show balance between vdevs=== | |||
zpool iostat -v 'pool' [interval in seconds] | |||
orjust | |||
zpool iostat -vc 'pool' | |||
== Tuning arc settings == | |||
See [https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning ZFS modules parameters] | |||
===zfs_arc_max=== | |||
Linux defaults to giving 50% of RAM to arc, this is when: | |||
cat /sys/module/zfs/parameters/zfs_arc_max | |||
0 | |||
grep c_max /proc/spl/kstat/zfs/arcstats | |||
To change this: | |||
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max | |||
and add to /etc/modprobe.d/zfs.conf | |||
options zfs zfs_arc_max=5368709120 | |||
'''NOTE you might need to run''' | |||
update-initramfs -u | |||
and perhaps clear caches and reset counters: | |||
echo 3 > /proc/sys/vm/drop_caches | |||
===Tune zfs_arc_dnode_limit_percent=== | |||
Assuming zfs_arc_dnode_limit = 0: | |||
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent | |||
In /etc/modprobe.d/zfs.conf: | |||
options zfs zfs_arc_dnode_limit_percent=20 | |||
===export iscsi=== | |||
https://linuxhint.com/share-zfs-volumes-via-iscsi/ | |||
= FAQ = | |||
==arc_summary== | |||
===VDEV cache disabled, skipping section=== | |||
This is normal, vdev caching is considered bad | |||
==Arc metadata size exceeds maximum== | |||
So '''arc_meta_used''' > '''arc_meta_limit''' | |||
==increasing feed rate== | |||
== show status and disks == | |||
zpool status | |||
== show drives/pools == | |||
zfs list | |||
== check raid level == | |||
zfs list -a | |||
==Estimate raidz speeds== | |||
raidz1: N/(N-1) * IOPS | |||
raidz2: N/(N-2) * IOPS | |||
raidz3: N/(N-3) * IOPS | |||
==VDEV cache disabled, skipping section== | |||
Looks like you just don't have l2arc cache | |||
==cannot export 'tank': pool is busy== | |||
After checking stuff like nfs etc try: | |||
zfs unshare -a | |||
zfs umount -a -f | |||
zpool export -f tank |
Revision as of 14:23, 15 April 2024
Links
- http://open-zfs.org
- http://www.edplese.com/samba-with-zfs.html
- ZFS calculator
- another zfs calculator
- ZFS clustering
- https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ ZFS and ECC]
- ZFS troubleshooting/disk replacement
- Creating a ZFS HA Cluster using shared or shared-nothing storage
- ZFS 101
- Raidz expansion
- Basic guide to working with zfs
- Archlinux page on ZFS
- Raidz basic concepts
Documentation
- zfs manpage
- ZFS on Linux
- openzfs wiki
- https://wiki.gentoo.org/wiki/ZFS
- ZFS cheatsheet
- http://wiki.freebsd.org/ZFSQuickStartGuide
- Opensolaris ZFS intro
- raidz types reference
- ZFS fragmentation
- raidz
ARC/Caching
- Configuring ZFS Cache for High-Speed IO
- ZFS Arc various sizes
- Activity of the ZFS ARC
- Understanding ARC hits
- ZFS Caching
- What is stored in parity
L2ARC
sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)'
and
egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats
Tuning ZFS
(Monitoring and Tuning ZFS Performance
ARC statistics
ZFS module parameters
/sys/module/zfs/parameters/ cat /proc/spl/kstat/zfs/arcstats
data_size
size of cached user data
dnode_size
hdr_size
size of L2ARC headers stored in main ARC
metadata_size
size of cached metadata
Tools
- ztop
- iozstat
- arc_summary
- zfs-linux-tools kstat-analyzer is rather helpful
kstat-analyzer
prefetch hit rate is low, consider tuning prefetcher
Check:
Supposed to leave that at 0:
cat /sys/module/zfs/parameters/zfs_vdev_cache_size
Code:
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK
Relevant links:
Processes
arc_evict
Evict buffers from list until we've removed the specified number of bytes. Move the removed buffers to the appropriate evict state. If the recycle flag is set, then attempt to "recycle" a buffer: - look for a buffer to evict that is `bytes' long. - return the data block from this buffer rather than freeing it. This flag is used by callers that are trying to make space for a new buffer in a full arc cache.
This function makes a "best effort". It skips over any buffers
it can't get a hash_lock on, and so may not catch all candidates.
It may also return without evicting as much space as requested.
arc_prune
Commands
Getting arc statistics
arcstat
arc_summary
Tip, for details use
arc_summary -d
There is also
cat /proc/spl/kstat/zfs/arcstats
and
zfetchstat + kstat-analyzer from zfs-linux-tools
zil/slog statistics
arc_summary -s zil
l2arc statistics
arc_summary -s l2arc
Getting IO statistics
zpool iostat -v 300
Terms and acronyms
vdev
Virtual Device.
ARC
Adaptive Replacement Cache
Portion of RAM used to cache data to speed up read performance
L2ARC
Level 2 Adaptive Replacement Cache
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM"
SSD cache
DMU
Data Management Unit
MFU
Most Frequently Used
MRU
Most Recently Used
zvol
kind of block device whose space is allocated from the pool, useful for iscsi targets
Scrubbing
Checking disks/data integrity
zpool status <poolname | grep scrub
and
zpool scrub <poolname>
probably taken care of by cron.
SLOG
See [ZIL]
ZIL
the space synchronous writes are logged before the confirmation is sent back to the client
prefetch
See /proc/spl/kstat/zfs/zfetchstats
- Understanding ZFS prefetch
- Tuning of the ZFS module
- Understanding ZFS prefetch
- Some basic ZFS ARC statistics and prefetching
- Some notes on ZFS prefetch related stats
- Activity of the ZFS ARC
HOWTO
Get sizes/reservations
zfs get quota,reservation tank/vol1
Caching
Add log/cache
zpool add rpool cache sdf
Add ZIL/SLOG write cache
zpool add rpool log mirror sdk sdl
Remove ZIl/SLOG mirrored cache
zpool remove mypool mirror-4 sdn1 sdo1
Getting statistics
Show cache activity
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5
zpool
zpool iostat
More statistics, every 5 seconds
zpool -v iostat 5
Flush linux caches
echo 3 > /proc/sys/vm/drop_caches
arc statistics
l2arc statistics
ZIL statistics
cat /proc/spl/kstat/zfs/zil
Create zfs filesystem
zfs create poolname/fsname
this also creates mountpoint
Add vdev to pool
zpool add mypool raidz1 sdg sdh sdi
Replace disk in zfs
Some links
Get information first:
Name of disk
zpool status
Find uid of disk to replace
take it offline
zpool offline poolname ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5RLZC6V
Get the disk guid:
zdb
guid: 15233236897831806877
Get list of disk by id:
ls -al /dev/disk/by-id
Save the id, shutdown, replace disk, boot:
Find the new disk:
ls -al /dev/disk/by-id
Run replace command. The id is the guid of the old disk, name is of the new disk
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ
or just
zpool replace tank /dev/sdi
If disk is shown as UNAVAIL
zpool offline tank sdi
Showing information about ZFS pools and datasets
Show pools with sizes
zpool list
or
zpool list -H -o name,size
Show reservations on datasets
zfs list -o name,reservation
Swap on zfs
https://askubuntu.com/questions/228149/zfs-partition-as-swap
zfs create pool/swap -V 4G -b 4K mkswap -f /dev/pool/swap swapon /dev/pool/swap
and remember fstab
vdevs
multiple vdevs
Multiple vdevs in a zpool get striped. What about balance?
invalid vdev specification
Probably means you need -f
show balance between vdevs
zpool iostat -v 'pool' [interval in seconds]
orjust
zpool iostat -vc 'pool'
Tuning arc settings
See Tuning ZFS modules parameters
zfs_arc_max
Linux defaults to giving 50% of RAM to arc, this is when:
cat /sys/module/zfs/parameters/zfs_arc_max 0 grep c_max /proc/spl/kstat/zfs/arcstats
To change this:
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max
and add to /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=5368709120
NOTE you might need to run
update-initramfs -u
and perhaps clear caches and reset counters:
echo 3 > /proc/sys/vm/drop_caches
Tune zfs_arc_dnode_limit_percent
Assuming zfs_arc_dnode_limit = 0:
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent
In /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_dnode_limit_percent=20
export iscsi
https://linuxhint.com/share-zfs-volumes-via-iscsi/
FAQ
arc_summary
VDEV cache disabled, skipping section
This is normal, vdev caching is considered bad
Arc metadata size exceeds maximum
So arc_meta_used > arc_meta_limit
increasing feed rate
show status and disks
zpool status
show drives/pools
zfs list
check raid level
zfs list -a
Estimate raidz speeds
raidz1: N/(N-1) * IOPS raidz2: N/(N-2) * IOPS raidz3: N/(N-3) * IOPS
VDEV cache disabled, skipping section
Looks like you just don't have l2arc cache
cannot export 'tank': pool is busy
After checking stuff like nfs etc try:
zfs unshare -a zfs umount -a -f zpool export -f tank