ZFS: Difference between revisions
mNo edit summary |
m (→prefetch) |
||
(62 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Links = | = Links = | ||
*[http://open-zfs.org http://open-zfs.org] | |||
*[http://www.edplese.com/samba-with-zfs.html http://www.edplese.com/samba-with-zfs.html] | |||
*[http://wintelguy.com/zfs-calc.pl ZFS calculator] | |||
*[https://www.raidz-calculator.com/default.aspx another zfs calculator] | |||
*[https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/ ZFS clustering] | |||
*[https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/] ZFS and ECC] | |||
*[https://docs.joyent.com/private-cloud/troubleshooting/disk-replacement ZFS troubleshooting/disk replacement] | |||
*[https://www.high-availability.com/docs/Quickstart-ZFS-Cluster/ Creating a ZFS HA Cluster using shared or shared-nothing storage] | |||
*[https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/ ZFS 101] | |||
*[https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ Raidz expansion] | |||
*[https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/ Basic guide to working with zfs] | |||
=Documentation= | |||
*[https://openzfs.github.io/openzfs-docs/man/4/zfs.4.html zfs manpage] | |||
*[http://zfsonlinux.org/ ZFS on Linux] | *[http://zfsonlinux.org/ ZFS on Linux] | ||
*[https://openzfs.org/wiki/ openzfs wiki] | |||
*[https://wiki.gentoo.org/wiki/ZFS https://wiki.gentoo.org/wiki/ZFS] | *[https://wiki.gentoo.org/wiki/ZFS https://wiki.gentoo.org/wiki/ZFS] | ||
*[https://blog.programster.org/zfs-cheatsheet ZFS cheatsheet] | *[https://blog.programster.org/zfs-cheatsheet ZFS cheatsheet] | ||
*[http://wiki.freebsd.org/ZFSQuickStartGuide http://wiki.freebsd.org/ZFSQuickStartGuide] | *[http://wiki.freebsd.org/ZFSQuickStartGuide http://wiki.freebsd.org/ZFSQuickStartGuide] | ||
*[http://www.opensolaris.org/os/community/zfs/intro/ http://www. | *[http://www.opensolaris.org/os/community/zfs/intro/ Opensolaris ZFS intro] | ||
*[http:// | *[http://www.raidz-calculator.com/raidz-types-reference.aspx raidz types reference] | ||
*[ | ==ARC/Caching== | ||
*[ | *[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | ||
*[https:// | *[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCItsVariousSizes ZFS Arc various sizes] | ||
* | *[http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ Activity of the ZFS ARC] | ||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSUnderstandingARCHits Understanding ARC hits] | |||
*[https://www.45drives.com/community/articles/zfs-caching/ ZFS Caching] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSZpoolFragmentationMeaning ZFS fragmentation] | |||
*[https://klarasystems.com/articles/openzfs-all-about-l2arc/ OpenZFS: All about the cache vdev or L2ARC] | |||
==ARC statistics== | |||
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning module parameters] | |||
cat /proc/spl/kstat/zfs/arcstats | |||
===data_size=== | |||
size of cached user data | |||
===dnode_size=== | |||
===hdr_size=== | |||
size of L2ARC headers stored in main ARC | |||
===metadata_size=== | |||
size of cached metadata | |||
==Tuning ZFS== | |||
*[https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/index.html ZFS Performance and Tuning] | |||
*[https://linuxhint.com/configure-zfs-cache-high-speed-io/ Configuring ZFS Cache for High-Speed IO] | |||
=Tools= | |||
*[https://github.com/asomers/ztop ztop] | |||
*[https://github.com/jimsalterjrs/ioztat iozstat] | |||
*[https://cuddletech.com/2008/10/explore-your-zfs-adaptive-replacement-cache-arc/ arc_summary] | |||
*[https://github.com/richardelling/zfs-linux-tools zfs-linux-tools] kstat-analyzer is rather helpful | |||
=Processes= | |||
==arc_evict== | |||
Evict buffers from list until we've removed the specified number of | |||
bytes. Move the removed buffers to the appropriate evict state. | |||
If the recycle flag is set, then attempt to "recycle" a buffer: | |||
- look for a buffer to evict that is `bytes' long. | |||
- return the data block from this buffer rather than freeing it. | |||
This flag is used by callers that are trying to make space for a | |||
new buffer in a full arc cache. | |||
This function makes a "best effort". It skips over any buffers | |||
it can't get a hash_lock on, and so may not catch all candidates. | |||
It may also return without evicting as much space as requested. | |||
==arc_prune== | |||
=Commands= | |||
==Getting arc statistics== | |||
arc_summary | |||
Tip, for details use | |||
arc_summary -d | |||
There is also | |||
cat /proc/spl/kstat/zfs/arcstats | |||
==Getting IO statistics== | |||
zpool iostat -v 300 | |||
=Terms and acronyms= | |||
==vdev== | |||
'''V'''irtual '''Dev'''ice. | |||
*[https://wiki.archlinux.org/title/ZFS/Virtual_disks ZFS Virtual disks] | |||
==ARC== | |||
'''A'''daptive '''R'''eplacement '''C'''ache | |||
Portion of RAM used to cache data to speed up read performance | |||
==L2ARC== | |||
'''L'''evel '''2''' '''A'''daptive Replacement '''C'''ache''' | |||
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM" | |||
SSD cache | |||
==DMU== | |||
Data Management Unit | |||
==MFU== | |||
Most Frequently Used | |||
==MRU== | |||
Most Recently Used | |||
==Scrubbing== | |||
Checking disks/data integrity | |||
zpool status <poolname | grep scrub | |||
and | |||
zpool scrub <poolname> | |||
probably taken care of by cron. | |||
==ZIL== | |||
the space synchronous writes are logged before the confirmation is sent back to the client | |||
==prefetch== | |||
*[https://svennd.be/tuning-of-zfs-module/ Tuning of the ZFS module] | |||
*[https://cuddletech.com/2009/05/understanding-zfs-prefetch/ Understanding ZFS prefetch] | |||
*[https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCStatsAndPrefetch Some basic ZFS ARC statistics and prefetching] | |||
= HOWTO = | = HOWTO = | ||
==Create zfs filesystem== | |||
zfs create poolname/fsname | |||
this also creates mountpoint | |||
==Add vdev to pool== | |||
zpool add mypool raidz1 sdg sdh sdi | |||
== Replace disk in zfs == | == Replace disk in zfs == | ||
Line 60: | Line 178: | ||
Run replace command. The id is the guid of the old disk, name is of the new disk | Run replace command. The id is the guid of the old disk, name is of the new disk | ||
zpool replace tank 13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | ||
==Showing information about ZFS pools and datasets== | |||
===Show pools with sizes=== | |||
zpool list | |||
or | |||
zpool list -H -o name,size | |||
===Show reservations on datasets=== | |||
zfs list -o name,reservations | |||
==Swap on zfs== | |||
https://askubuntu.com/questions/228149/zfs-partition-as-swap | |||
==vdevs== | |||
===multiple vdevs=== | |||
Multiple vdevs in a zpool get striped. | |||
What about balance? | |||
===invalid vdev specification=== | |||
Probably means you need -f | |||
===show balance between vdevs=== | |||
zpool iostat -v 'pool' [interval in seconds] | |||
orjust | |||
zpool iostat -vc 'pool' | |||
== Tuning arc settings == | == Tuning arc settings == | ||
See [https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html Tuning ZFS modules parameters] | |||
===zfs_arc_max=== | |||
Linux defaults to giving 50% of RAM to arc, this is when: | |||
cat /sys/module/zfs/parameters/zfs_arc_max | |||
0 | |||
grep c_max /proc/spl/kstat/zfs/arcstats | grep c_max /proc/spl/kstat/zfs/arcstats | ||
To change this: | |||
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max | echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max | ||
and add to /etc/modprobe.d/zfs.conf | |||
zfs zfs_arc_max=5368709120 | |||
maybe you need | maybe you need | ||
echo 3 > /proc/sys/vm/drop_caches | echo 3 > /proc/sys/vm/drop_caches | ||
===Tune zfs_arc_dnode_limit_percent=== | |||
Assuming zfs_arc_dnode_limit = 0: | |||
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent | |||
In /etc/modprobe.d/zfs.conf: | |||
options zfs zfs_arc_dnode_limit_percent=20 | |||
= FAQ = | = FAQ = | ||
==Arc metadata size exceeds maximum=== | |||
So '''arc_meta_used''' > '''arc_meta_limit''' | |||
== show status == | == show status and disks == | ||
zpool status | |||
== show drives/pools == | == show drives/pools == | ||
Line 89: | Line 251: | ||
zfs list -a | zfs list -a | ||
==Estimate raidz speeds== | |||
raidz1: N/(N-1) * IOPS | |||
raidz2: N/(N-2) * IOPS | |||
raidz3: N/(N-3) * IOPS | |||
==VDEV cache disabled, skipping section== | |||
Looks like you just don't have l2arc cache |
Latest revision as of 14:59, 21 September 2023
Links
- http://open-zfs.org
- http://www.edplese.com/samba-with-zfs.html
- ZFS calculator
- another zfs calculator
- ZFS clustering
- https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ ZFS and ECC]
- ZFS troubleshooting/disk replacement
- Creating a ZFS HA Cluster using shared or shared-nothing storage
- ZFS 101
- Raidz expansion
- Basic guide to working with zfs
Documentation
- zfs manpage
- ZFS on Linux
- openzfs wiki
- https://wiki.gentoo.org/wiki/ZFS
- ZFS cheatsheet
- http://wiki.freebsd.org/ZFSQuickStartGuide
- Opensolaris ZFS intro
- raidz types reference
ARC/Caching
- Configuring ZFS Cache for High-Speed IO
- ZFS Arc various sizes
- Activity of the ZFS ARC
- Understanding ARC hits
- ZFS Caching
- ZFS fragmentation
- OpenZFS: All about the cache vdev or L2ARC
ARC statistics
cat /proc/spl/kstat/zfs/arcstats
data_size
size of cached user data
dnode_size
hdr_size
size of L2ARC headers stored in main ARC
metadata_size
size of cached metadata
Tuning ZFS
Tools
- ztop
- iozstat
- arc_summary
- zfs-linux-tools kstat-analyzer is rather helpful
Processes
arc_evict
Evict buffers from list until we've removed the specified number of bytes. Move the removed buffers to the appropriate evict state. If the recycle flag is set, then attempt to "recycle" a buffer: - look for a buffer to evict that is `bytes' long. - return the data block from this buffer rather than freeing it. This flag is used by callers that are trying to make space for a new buffer in a full arc cache.
This function makes a "best effort". It skips over any buffers
it can't get a hash_lock on, and so may not catch all candidates.
It may also return without evicting as much space as requested.
arc_prune
Commands
Getting arc statistics
arc_summary
Tip, for details use
arc_summary -d
There is also
cat /proc/spl/kstat/zfs/arcstats
Getting IO statistics
zpool iostat -v 300
Terms and acronyms
vdev
Virtual Device.
ARC
Adaptive Replacement Cache
Portion of RAM used to cache data to speed up read performance
L2ARC
Level 2 Adaptive Replacement Cache
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM"
SSD cache
DMU
Data Management Unit
MFU
Most Frequently Used
MRU
Most Recently Used
Scrubbing
Checking disks/data integrity
zpool status <poolname | grep scrub
and
zpool scrub <poolname>
probably taken care of by cron.
ZIL
the space synchronous writes are logged before the confirmation is sent back to the client
prefetch
HOWTO
Create zfs filesystem
zfs create poolname/fsname
this also creates mountpoint
Add vdev to pool
zpool add mypool raidz1 sdg sdh sdi
Replace disk in zfs
Some links
Get information first:
Name of disk
zpool status
Find uid of disk to replace
take it offline
zpool offline poolname ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5RLZC6V
Get the disk guid:
zdb
guid: 15233236897831806877
Get list of disk by id:
ls -al /dev/disk/by-id
Save the id, shutdown, replace disk, boot:
Find the new disk:
ls -al /dev/disk/by-id
Run replace command. The id is the guid of the old disk, name is of the new disk
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ
Showing information about ZFS pools and datasets
Show pools with sizes
zpool list
or
zpool list -H -o name,size
Show reservations on datasets
zfs list -o name,reservations
Swap on zfs
https://askubuntu.com/questions/228149/zfs-partition-as-swap
vdevs
multiple vdevs
Multiple vdevs in a zpool get striped. What about balance?
invalid vdev specification
Probably means you need -f
show balance between vdevs
zpool iostat -v 'pool' [interval in seconds]
orjust
zpool iostat -vc 'pool'
Tuning arc settings
See Tuning ZFS modules parameters
zfs_arc_max
Linux defaults to giving 50% of RAM to arc, this is when:
cat /sys/module/zfs/parameters/zfs_arc_max 0 grep c_max /proc/spl/kstat/zfs/arcstats
To change this:
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max
and add to /etc/modprobe.d/zfs.conf
zfs zfs_arc_max=5368709120
maybe you need
echo 3 > /proc/sys/vm/drop_caches
Tune zfs_arc_dnode_limit_percent
Assuming zfs_arc_dnode_limit = 0:
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent
In /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_dnode_limit_percent=20
FAQ
Arc metadata size exceeds maximum=
So arc_meta_used > arc_meta_limit
show status and disks
zpool status
show drives/pools
zfs list
check raid level
zfs list -a
Estimate raidz speeds
raidz1: N/(N-1) * IOPS raidz2: N/(N-2) * IOPS raidz3: N/(N-3) * IOPS
VDEV cache disabled, skipping section
Looks like you just don't have l2arc cache