ZFS: Difference between revisions
m (→HOWTO) |
|||
(25 intermediate revisions by the same user not shown) | |||
Line 36: | Line 36: | ||
===L2ARC=== | ===L2ARC=== | ||
*[https://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html ZFS L2ARC] | |||
*[https://klarasystems.com/articles/openzfs-all-about-l2arc/ OpenZFS: All about the cache vdev or L2ARC] | *[https://klarasystems.com/articles/openzfs-all-about-l2arc/ OpenZFS: All about the cache vdev or L2ARC] | ||
Line 71: | Line 72: | ||
*[https://cuddletech.com/2008/10/explore-your-zfs-adaptive-replacement-cache-arc/ arc_summary] | *[https://cuddletech.com/2008/10/explore-your-zfs-adaptive-replacement-cache-arc/ arc_summary] | ||
*[https://github.com/richardelling/zfs-linux-tools zfs-linux-tools] kstat-analyzer is rather helpful | *[https://github.com/richardelling/zfs-linux-tools zfs-linux-tools] kstat-analyzer is rather helpful | ||
==kstat-analyzer== | |||
===prefetch hit rate is low, consider tuning prefetcher=== | |||
Check: | |||
Supposed to leave that at 0: | |||
cat /sys/module/zfs/parameters/zfs_vdev_cache_size | |||
Code: | |||
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK | |||
Relevant links: | |||
*https://www.truenas.com/community/threads/notes-on-zfs-prefetch.1076/ | |||
*https://www.phoronix.com/news/OpenZFS-Uncached-Prefetch | |||
=Processes= | =Processes= | ||
Line 92: | Line 111: | ||
==Getting arc statistics== | ==Getting arc statistics== | ||
arcstat | |||
arc_summary | arc_summary | ||
Tip, for details use | Tip, for details use | ||
Line 98: | Line 119: | ||
cat /proc/spl/kstat/zfs/arcstats | cat /proc/spl/kstat/zfs/arcstats | ||
and | |||
zfetchstat + kstat-analyzer from zfs-linux-tools | |||
===zil/slog statistics=== | |||
arc_summary -s zil | |||
or | |||
cat /proc/spl/kstat/zfs/zil | |||
or | |||
zilstat | |||
or | |||
zpool iostat -v | |||
===l2arc statistics=== | |||
arc_summary -s l2arc | |||
==Getting IO statistics== | ==Getting IO statistics== | ||
Line 160: | Line 196: | ||
= HOWTO = | = HOWTO = | ||
==Add log/cache== | |||
==Get sizes/reservations== | |||
zfs get quota,reservation tank/vol1 | |||
==Caching== | |||
===Add log/cache=== | |||
For l2arc mirrors make little sense, just add disks | |||
zpool add rpool cache sdf | zpool add rpool cache sdf | ||
or maybe better | |||
zpool add rpool cache /dev/disk/by-id/ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394 | |||
or simply | |||
zpool add rpool cache ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394 | |||
==Add ZIL/SLOG write cache== | ===Add ZIL/SLOG write cache=== | ||
zpool add rpool log mirror sdk sdl | zpool add rpool log mirror sdk sdl | ||
==Remove ZIl/SLOG mirrored cache== | ===Remove ZIl/SLOG mirrored cache=== | ||
zpool remove mypool mirror-4 sdn1 sdo1 | zpool remove mypool mirror-4 sdn1 sdo1 | ||
==Getting statistics== | ==Getting statistics== | ||
===Show cache activity=== | |||
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5 | |||
===zpool=== | ===zpool=== | ||
zpool iostat | zpool iostat | ||
Line 175: | Line 226: | ||
zpool -v iostat 5 | zpool -v iostat 5 | ||
===Flush linux caches=== | |||
echo 3 > /proc/sys/vm/drop_caches | |||
===arc statistics=== | ===arc statistics=== | ||
Line 228: | Line 281: | ||
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ | ||
or just | |||
zpool replace tank /dev/sdi | |||
===If disk is shown as '''UNAVAIL'''=== | |||
zpool offline tank sdi | |||
==Showing information about ZFS pools and datasets== | ==Showing information about ZFS pools and datasets== | ||
Line 237: | Line 298: | ||
===Show reservations on datasets=== | ===Show reservations on datasets=== | ||
zfs list -o name, | zfs list -o name,reservation | ||
==Swap on zfs== | ==Swap on zfs== | ||
https://askubuntu.com/questions/228149/zfs-partition-as-swap | https://askubuntu.com/questions/228149/zfs-partition-as-swap | ||
zfs create pool/swap -V 4G -b 4K | |||
mkswap -f /dev/pool/swap | |||
swapon /dev/pool/swap | |||
and remember fstab | |||
==vdevs== | ==vdevs== | ||
Line 267: | Line 332: | ||
options zfs zfs_arc_max=5368709120 | options zfs zfs_arc_max=5368709120 | ||
'''NOTE you might need to run (for example when running / on zfs)''' | |||
update-initramfs -u -k all | |||
and perhaps clear caches and reset counters: | and perhaps clear caches and reset counters: | ||
Line 277: | Line 344: | ||
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent | echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent | ||
In /etc/modprobe.d/zfs.conf: | In /etc/modprobe.d/zfs.conf: | ||
options zfs zfs_arc_dnode_limit_percent=20 | options zfs zfs_arc_dnode_limit_percent=20 | ||
===export iscsi=== | |||
https://linuxhint.com/share-zfs-volumes-via-iscsi/ | |||
= FAQ = | = FAQ = | ||
==Arc metadata size exceeds maximum | ==arc_summary== | ||
===VDEV cache disabled, skipping section=== | |||
This is normal, vdev caching is considered bad in current code | |||
==Arc metadata size exceeds maximum== | |||
So '''arc_meta_used''' > '''arc_meta_limit''' | So '''arc_meta_used''' > '''arc_meta_limit''' | ||
==increasing feed rate== | |||
Latest revision as of 12:47, 25 July 2024
Links
- http://open-zfs.org
- http://www.edplese.com/samba-with-zfs.html
- ZFS calculator
- another zfs calculator
- ZFS clustering
- https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ ZFS and ECC]
- ZFS troubleshooting/disk replacement
- Creating a ZFS HA Cluster using shared or shared-nothing storage
- ZFS 101
- Raidz expansion
- Basic guide to working with zfs
- Archlinux page on ZFS
- Raidz basic concepts
Documentation
- zfs manpage
- ZFS on Linux
- openzfs wiki
- https://wiki.gentoo.org/wiki/ZFS
- ZFS cheatsheet
- http://wiki.freebsd.org/ZFSQuickStartGuide
- Opensolaris ZFS intro
- raidz types reference
- ZFS fragmentation
- raidz
ARC/Caching
- Configuring ZFS Cache for High-Speed IO
- ZFS Arc various sizes
- Activity of the ZFS ARC
- Understanding ARC hits
- ZFS Caching
- What is stored in parity
L2ARC
sysctl kstat.zfs.misc.arcstats | egrep 'l2_(hits|misses)'
and
egrep 'l2_(hits|misses)' /proc/spl/kstat/zfs/arcstats
Tuning ZFS
(Monitoring and Tuning ZFS Performance
ARC statistics
ZFS module parameters
/sys/module/zfs/parameters/ cat /proc/spl/kstat/zfs/arcstats
data_size
size of cached user data
dnode_size
hdr_size
size of L2ARC headers stored in main ARC
metadata_size
size of cached metadata
Tools
- ztop
- iozstat
- arc_summary
- zfs-linux-tools kstat-analyzer is rather helpful
kstat-analyzer
prefetch hit rate is low, consider tuning prefetcher
Check:
Supposed to leave that at 0:
cat /sys/module/zfs/parameters/zfs_vdev_cache_size
Code:
if (float(kstats['hits']) / accesses) < PREFETCH_RATIO_OK
Relevant links:
Processes
arc_evict
Evict buffers from list until we've removed the specified number of bytes. Move the removed buffers to the appropriate evict state. If the recycle flag is set, then attempt to "recycle" a buffer: - look for a buffer to evict that is `bytes' long. - return the data block from this buffer rather than freeing it. This flag is used by callers that are trying to make space for a new buffer in a full arc cache.
This function makes a "best effort". It skips over any buffers
it can't get a hash_lock on, and so may not catch all candidates.
It may also return without evicting as much space as requested.
arc_prune
Commands
Getting arc statistics
arcstat
arc_summary
Tip, for details use
arc_summary -d
There is also
cat /proc/spl/kstat/zfs/arcstats
and
zfetchstat + kstat-analyzer from zfs-linux-tools
zil/slog statistics
arc_summary -s zil
or
cat /proc/spl/kstat/zfs/zil
or
zilstat
or
zpool iostat -v
l2arc statistics
arc_summary -s l2arc
Getting IO statistics
zpool iostat -v 300
Terms and acronyms
vdev
Virtual Device.
ARC
Adaptive Replacement Cache
Portion of RAM used to cache data to speed up read performance
L2ARC
Level 2 Adaptive Replacement Cache
"L2ARC is usually considered if hit rate for the ARC is below 90% while having 64+ GB of RAM"
SSD cache
DMU
Data Management Unit
MFU
Most Frequently Used
MRU
Most Recently Used
zvol
kind of block device whose space is allocated from the pool, useful for iscsi targets
Scrubbing
Checking disks/data integrity
zpool status <poolname | grep scrub
and
zpool scrub <poolname>
probably taken care of by cron.
SLOG
See [ZIL]
ZIL
the space synchronous writes are logged before the confirmation is sent back to the client
prefetch
See /proc/spl/kstat/zfs/zfetchstats
- Understanding ZFS prefetch
- Tuning of the ZFS module
- Understanding ZFS prefetch
- Some basic ZFS ARC statistics and prefetching
- Some notes on ZFS prefetch related stats
- Activity of the ZFS ARC
HOWTO
Get sizes/reservations
zfs get quota,reservation tank/vol1
Caching
Add log/cache
For l2arc mirrors make little sense, just add disks
zpool add rpool cache sdf
or maybe better
zpool add rpool cache /dev/disk/by-id/ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394
or simply
zpool add rpool cache ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N47394
Add ZIL/SLOG write cache
zpool add rpool log mirror sdk sdl
Remove ZIl/SLOG mirrored cache
zpool remove mypool mirror-4 sdn1 sdo1
Getting statistics
Show cache activity
dstat --zfs-arc --zfs-l2arc --zfs-zil -d 5
zpool
zpool iostat
More statistics, every 5 seconds
zpool -v iostat 5
Flush linux caches
echo 3 > /proc/sys/vm/drop_caches
arc statistics
l2arc statistics
ZIL statistics
cat /proc/spl/kstat/zfs/zil
Create zfs filesystem
zfs create poolname/fsname
this also creates mountpoint
Add vdev to pool
zpool add mypool raidz1 sdg sdh sdi
Replace disk in zfs
Some links
Get information first:
Name of disk
zpool status
Find uid of disk to replace
take it offline
zpool offline poolname ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M5RLZC6V
Get the disk guid:
zdb
guid: 15233236897831806877
Get list of disk by id:
ls -al /dev/disk/by-id
Save the id, shutdown, replace disk, boot:
Find the new disk:
ls -al /dev/disk/by-id
Run replace command. The id is the guid of the old disk, name is of the new disk
zpool replace tank /dev/disk/by-id/13450850036953119346 /dev/disk/by-id/ata-ST4000VN000-1H4168_Z302FQVZ
or just
zpool replace tank /dev/sdi
If disk is shown as UNAVAIL
zpool offline tank sdi
Showing information about ZFS pools and datasets
Show pools with sizes
zpool list
or
zpool list -H -o name,size
Show reservations on datasets
zfs list -o name,reservation
Swap on zfs
https://askubuntu.com/questions/228149/zfs-partition-as-swap
zfs create pool/swap -V 4G -b 4K mkswap -f /dev/pool/swap swapon /dev/pool/swap
and remember fstab
vdevs
multiple vdevs
Multiple vdevs in a zpool get striped. What about balance?
invalid vdev specification
Probably means you need -f
show balance between vdevs
zpool iostat -v 'pool' [interval in seconds]
orjust
zpool iostat -vc 'pool'
Tuning arc settings
See Tuning ZFS modules parameters
zfs_arc_max
Linux defaults to giving 50% of RAM to arc, this is when:
cat /sys/module/zfs/parameters/zfs_arc_max 0 grep c_max /proc/spl/kstat/zfs/arcstats
To change this:
echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max
and add to /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=5368709120
NOTE you might need to run (for example when running / on zfs)
update-initramfs -u -k all
and perhaps clear caches and reset counters:
echo 3 > /proc/sys/vm/drop_caches
Tune zfs_arc_dnode_limit_percent
Assuming zfs_arc_dnode_limit = 0:
echo 20 > /sys/module/zfs/parameters/zfs_arc_dnode_limit_percent
In /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_dnode_limit_percent=20
export iscsi
https://linuxhint.com/share-zfs-volumes-via-iscsi/
FAQ
arc_summary
VDEV cache disabled, skipping section
This is normal, vdev caching is considered bad in current code
Arc metadata size exceeds maximum
So arc_meta_used > arc_meta_limit
increasing feed rate
show status and disks
zpool status
show drives/pools
zfs list
check raid level
zfs list -a
Estimate raidz speeds
raidz1: N/(N-1) * IOPS raidz2: N/(N-2) * IOPS raidz3: N/(N-3) * IOPS
VDEV cache disabled, skipping section
Looks like you just don't have l2arc cache
cannot export 'tank': pool is busy
After checking stuff like nfs etc try:
zfs unshare -a zfs umount -a -f zpool export -f tank