Smart: Difference between revisions
m (→Links) |
|||
(39 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==Links= | = Disk monitoring = | ||
*[https://www.smartmontools.org/wiki/ Smartmontools wiki] | |||
*[https://wiki.unraid.net/Understanding_SMART_Reports Understanding SMART reports] | |||
*https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl | = Links = | ||
*[https://www.thomas-krenn.com/en/wiki/Smartmontools_with_MegaRAID_Controller Smartmontools with megaraid] | |||
*[https://www.smartmontools.org/wiki/ Smartmontools wiki] | |||
*[https://en.wikipedia.org/wiki/S.M.A.R.T. Wikipedia article] | |||
*[https://wiki.unraid.net/Understanding_SMART_Reports Understanding SMART reports] | |||
*[https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl] | |||
*[https://www.thomas-krenn.com/en/wiki/Smartmontools_with_MegaRAID_Controller Smartmontools with megaraid] | |||
*[https://wiki.archlinux.org/index.php/S.M.A.R.T https://wiki.archlinux.org/index.php/S.M.A.R.T]. | |||
*[https://www.smartmontools.org/wiki/FAQ smartmontools FAQ] | |||
*[https://www.backblaze.com/blog/hard-drive-smart-stats/ Smart hard drive stats] | |||
*[https://www.ibm.com/docs/en/flashsystem-v7000u/1.5.2?topic=problems-smart-ascascq-error-codes-messages Smart ASC error codes] | |||
= Tools = | |||
*smartctl | |||
*gsmartcontrol | |||
| |||
= Useful commands = | |||
== enable smart == | |||
smartctl -i /dev/sda | smartctl -i /dev/sda | ||
==check smart status== | |||
== check smart status == | |||
smartctl -a /dev/sda | smartctl -a /dev/sda | ||
== check smart status of disk in raid array == | |||
Something like | |||
smartctl -a /dev/bus/0 -d sat+megaraid,11 | |||
where 11 would be disk id (DID) | |||
== check error log == | |||
smartctl -l error /dev/sdb | |||
== Device: /dev/bus/0 [megaraid_disk_09] [SAT] == | |||
Try | |||
smartctl --scan | |||
| |||
| |||
= Some codes and messages = | |||
*[https://www.ibm.com/docs/en/flashsystem-v7000u/1.5.2?topic=problems-smart-ascascq-error-codes-messages SMART ASC/ASCQ error codes and messages] | |||
==Meaning of '''pre-fail'''== | |||
Pending disk failure when equal to or lower than treshold value | |||
== LBA_of_first_error == | |||
*[https://www.smartmontools.org/wiki/BadBlockHowto https://www.smartmontools.org/wiki/BadBlockHowto] | |||
| |||
== Device is: Not in smartctl database [for details use: -P showall] == | |||
Try | |||
/usr/sbin/update-smart-drivedb | |||
otherwise check out [https://www.smartmontools.org/wiki/FAQ#MyATASATAdriveisnotinthesmartctlsmartddatabase https://www.smartmontools.org/wiki/FAQ#MyATASATAdriveisnotinthesmartctlsmartddatabase] | |||
== Uncorrectable Sector Count == | |||
Check [https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf] | |||
== 198 Offline Uncorrectable == | |||
bad sign, on ssd mostly/only when number gets high | |||
| |||
== SSD specific == | |||
*[[https://unix.stackexchange.com/questions/106678/how-to-check-the-life-left-in-ssd-or-the-mediums-wear-level https://unix.stackexchange.com/questions/106678/how-to-check-the-life-left-in-ssd-or-the-mediums-wear-level] | |||
*[https://www.compuram.de/blog/en/the-life-span-of-a-ssd-how-long-does-it-last-and-what-can-be-done-to-take-care/ Life span of SSD] | |||
*[https://lowendtalk.com/discussion/141041/is-my-server-ssd-disk-healty Is my SSD disk healthy?] | |||
=== 191 G-Sense_Error_Rate === | |||
Vibrations or shocks | |||
=== 231 SSD_Life_Left === | |||
Percentage of life left | |||
=== 232 Available Reserved Space === | |||
Number of physical erase cycles completed on the SSD as a percentage of the maximum physical erase cycles the drive is designed to endure. | |||
Intel SSDs report the available reserved space as a percentage of the initial reserved space. | |||
=== 233 Media Wearout Indicator === | |||
Decreases down to 1, then start hoping. It might still last a while. | |||
=== 241 Lifetime_Writes_GiB === | |||
Amount of data written so far | |||
===SSD/nVME and USB adapters=== | |||
Seemingly not supported: | |||
/dev/sdb: Unknown USB bridge [0x152d:0x2578 (0x117)] | |||
Bus 002 Device 002: ID 152d:2578 JMicron Technology Corp. / JMicron USA Technology Corp. USB to ATA/ATAPI Bridge | |||
Try: | |||
smartctl -a -d sat /dev/sdb | |||
====With working smart support==== | |||
rtl9210 like the DELOCK 64198 | |||
=== smartd: Failed SMART usage Attribute === | |||
Might be yelling about a disk that's already been replaced, try restarting smartd | |||
=== Error: UNC at LBA = 0x00113cd8 = 1129688 === | |||
Uncorrectable LBA error, throw away | |||
= FAQ = | |||
==Wear level== | |||
===Wear level on nvme=== | |||
There's '''percentage used''': | |||
High = bad | |||
and '''available spare''': | |||
Low = bad | |||
==List disks== | |||
smartctl --scan | |||
== A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. == | |||
meh | |||
| |||
== Device: /dev/bus/0 [megaraid_disk_09] [SAT], failed to read SMART Attribute Data == | |||
Controller probably doesn't allow smart, it also seems to happen when drive is hot spare on megamaid controller. | |||
==FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]== | |||
time to replace disk? | |||
==/dev/sdc: Unknown USB bridge == | |||
Smart over USB usually doesn't do much, try | |||
smartctl -a -d scsi /dev/sdc | |||
to get some information | |||
==Unavailable - device lacks SMART capability.== | |||
?? | |||
==smartctl: unsupported scsi opcode== | |||
Try | |||
smartctl -a -d scsi /dev/sdc | |||
= | ==Non-medium error count== | ||
CRC errors, could be cable or controller |
Latest revision as of 13:02, 16 April 2024
Disk monitoring
Links
- Smartmontools wiki
- Wikipedia article
- Understanding SMART reports
- https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl
- Smartmontools with megaraid
- https://wiki.archlinux.org/index.php/S.M.A.R.T.
- smartmontools FAQ
- Smart hard drive stats
- Smart ASC error codes
Tools
- smartctl
- gsmartcontrol
Useful commands
enable smart
smartctl -i /dev/sda
check smart status
smartctl -a /dev/sda
check smart status of disk in raid array
Something like
smartctl -a /dev/bus/0 -d sat+megaraid,11
where 11 would be disk id (DID)
check error log
smartctl -l error /dev/sdb
Device: /dev/bus/0 [megaraid_disk_09] [SAT]
Try
smartctl --scan
Some codes and messages
Meaning of pre-fail
Pending disk failure when equal to or lower than treshold value
LBA_of_first_error
Device is: Not in smartctl database [for details use: -P showall]
Try
/usr/sbin/update-smart-drivedb
otherwise check out https://www.smartmontools.org/wiki/FAQ#MyATASATAdriveisnotinthesmartctlsmartddatabase
Uncorrectable Sector Count
Check https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf
198 Offline Uncorrectable
bad sign, on ssd mostly/only when number gets high
SSD specific
- [https://unix.stackexchange.com/questions/106678/how-to-check-the-life-left-in-ssd-or-the-mediums-wear-level
- Life span of SSD
- Is my SSD disk healthy?
191 G-Sense_Error_Rate
Vibrations or shocks
231 SSD_Life_Left
Percentage of life left
232 Available Reserved Space
Number of physical erase cycles completed on the SSD as a percentage of the maximum physical erase cycles the drive is designed to endure.
Intel SSDs report the available reserved space as a percentage of the initial reserved space.
233 Media Wearout Indicator
Decreases down to 1, then start hoping. It might still last a while.
241 Lifetime_Writes_GiB
Amount of data written so far
SSD/nVME and USB adapters
Seemingly not supported: /dev/sdb: Unknown USB bridge [0x152d:0x2578 (0x117)] Bus 002 Device 002: ID 152d:2578 JMicron Technology Corp. / JMicron USA Technology Corp. USB to ATA/ATAPI Bridge Try:
smartctl -a -d sat /dev/sdb
With working smart support
rtl9210 like the DELOCK 64198
smartd: Failed SMART usage Attribute
Might be yelling about a disk that's already been replaced, try restarting smartd
Error: UNC at LBA = 0x00113cd8 = 1129688
Uncorrectable LBA error, throw away
FAQ
Wear level
Wear level on nvme
There's percentage used:
High = bad
and available spare:
Low = bad
List disks
smartctl --scan
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
meh
Device: /dev/bus/0 [megaraid_disk_09] [SAT], failed to read SMART Attribute Data
Controller probably doesn't allow smart, it also seems to happen when drive is hot spare on megamaid controller.
FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
time to replace disk?
/dev/sdc: Unknown USB bridge
Smart over USB usually doesn't do much, try
smartctl -a -d scsi /dev/sdc
to get some information
??
smartctl: unsupported scsi opcode
Try
smartctl -a -d scsi /dev/sdc
Non-medium error count
CRC errors, could be cable or controller