Smart: Difference between revisions

From DWIKI
mNo edit summary
 
(28 intermediate revisions by the same user not shown)
Line 3: Line 3:




== Links ==
= Links =


*[https://www.smartmontools.org/wiki/ Smartmontools wiki]  
*[https://www.smartmontools.org/wiki/ Smartmontools wiki]  
*[https://en.wikipedia.org/wiki/S.M.A.R.T. Wikipedia article]
*[https://en.wikipedia.org/wiki/S.M.A.R.T. Wikipedia article]  
*[https://wiki.unraid.net/Understanding_SMART_Reports Understanding SMART reports]  
*[https://wiki.unraid.net/Understanding_SMART_Reports Understanding SMART reports]  
*[https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl]  
*[https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl]  
*[https://www.thomas-krenn.com/en/wiki/Smartmontools_with_MegaRAID_Controller Smartmontools with megaraid]  
*[https://www.thomas-krenn.com/en/wiki/Smartmontools_with_MegaRAID_Controller Smartmontools with megaraid]  
*[https://wiki.archlinux.org/index.php/S.M.A.R.T https://wiki.archlinux.org/index.php/S.M.A.R.T].
*[https://wiki.archlinux.org/index.php/S.M.A.R.T https://wiki.archlinux.org/index.php/S.M.A.R.T].  
*[https://www.smartmontools.org/wiki/FAQ smartmontools FAQ]
*[https://www.backblaze.com/blog/hard-drive-smart-stats/ Smart hard drive stats]
*[https://www.ibm.com/docs/en/flashsystem-v7000u/1.5.2?topic=problems-smart-ascascq-error-codes-messages Smart ASC error codes]


= Tools =
= Tools =
Line 17: Line 20:
*gsmartcontrol  
*gsmartcontrol  


 
 


= Useful commands =
= Useful commands =
Line 29: Line 32:
  smartctl -a /dev/sda
  smartctl -a /dev/sda


 
== check smart status of disk in raid array ==
Something like
smartctl -a /dev/bus/0 -d sat+megaraid,11
where 11 would be disk id (DID)


== check error log ==
== check error log ==
Line 35: Line 41:
  smartctl -l error /dev/sdb
  smartctl -l error /dev/sdb
        
        
== Device: /dev/bus/0 [megaraid_disk_09] [SAT] ==
== Device: /dev/bus/0 [megaraid_disk_09] [SAT] ==
Try
Try
  smartctl --scan
  smartctl --scan


 
 


= Some codes and messages =
= Some codes and messages =
*[https://www.ibm.com/docs/en/flashsystem-v7000u/1.5.2?topic=problems-smart-ascascq-error-codes-messages SMART ASC/ASCQ error codes and messages]
==Meaning of '''pre-fail'''==
Pending disk failure when equal to or lower than treshold value


== LBA_of_first_error ==
== LBA_of_first_error ==
Line 60: Line 77:
Check [https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf]
Check [https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf]


=== 198 Offline Uncorrectable ====
== 198 Offline Uncorrectable ==
 
bad sign, on ssd mostly/only when number gets high
bad sign, on ssd mostly/only when number gets high
 
== SSD specific ==
*[[https://unix.stackexchange.com/questions/106678/how-to-check-the-life-left-in-ssd-or-the-mediums-wear-level https://unix.stackexchange.com/questions/106678/how-to-check-the-life-left-in-ssd-or-the-mediums-wear-level]
*[https://www.compuram.de/blog/en/the-life-span-of-a-ssd-how-long-does-it-last-and-what-can-be-done-to-take-care/ Life span of SSD]
*[https://lowendtalk.com/discussion/141041/is-my-server-ssd-disk-healty Is my SSD disk healthy?]
=== 191 G-Sense_Error_Rate ===
Vibrations or shocks
=== 231 SSD_Life_Left ===
Percentage of life left
=== 232 Available Reserved Space ===
Number of physical erase cycles completed on the SSD as a percentage of the maximum physical erase cycles the drive is designed to endure.
Intel SSDs report the available reserved space as a percentage of the initial reserved space.
=== 233 Media Wearout Indicator ===
Decreases down to 1, then start hoping. It might still last a while.
=== 241 Lifetime_Writes_GiB ===
Amount of data written so far
===SSD/nVME and USB adapters===
Seemingly not supported:
/dev/sdb: Unknown USB bridge [0x152d:0x2578 (0x117)]
Bus 002 Device 002: ID 152d:2578 JMicron Technology Corp. / JMicron USA Technology Corp. USB to ATA/ATAPI Bridge
Try:
smartctl -a -d sat /dev/sdb
====With working smart support====
rtl9210 like the DELOCK 64198
=== smartd: Failed SMART usage Attribute ===
Might be yelling about a disk that's already been replaced, try restarting smartd
=== Error: UNC at LBA = 0x00113cd8 = 1129688 ===
Uncorrectable LBA error, throw away
= FAQ =
==Wear level==
===Wear level on nvme===
There's '''percentage used''':
High = bad
and '''available spare''':
Low = bad
==List disks==
smartctl --scan
== A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. ==
meh
 
== Device: /dev/bus/0 [megaraid_disk_09] [SAT], failed to read SMART Attribute Data ==
Controller probably doesn't allow smart, it also seems to happen when drive is hot spare on megamaid controller.
==FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]==
time to replace disk?
==/dev/sdc: Unknown USB bridge ==
Smart over USB usually doesn't do much, try
smartctl -a -d scsi /dev/sdc
to get some information
==Unavailable - device lacks SMART capability.==
??
==smartctl: unsupported scsi opcode==
Try
smartctl -a -d scsi /dev/sdc
==Non-medium error count==
CRC errors, could be cable or controller

Latest revision as of 13:02, 16 April 2024

Disk monitoring

Links

Tools

  • smartctl
  • gsmartcontrol

 

Useful commands

enable smart

smartctl -i /dev/sda

check smart status

smartctl -a /dev/sda

check smart status of disk in raid array

Something like

smartctl -a /dev/bus/0 -d sat+megaraid,11

where 11 would be disk id (DID)

check error log

smartctl -l error /dev/sdb
      

Device: /dev/bus/0 [megaraid_disk_09] [SAT]

Try

smartctl --scan

 

 

Some codes and messages

Meaning of pre-fail

Pending disk failure when equal to or lower than treshold value


LBA_of_first_error

 

Device is: Not in smartctl database [for details use: -P showall]

Try

/usr/sbin/update-smart-drivedb

otherwise check out https://www.smartmontools.org/wiki/FAQ#MyATASATAdriveisnotinthesmartctlsmartddatabase

Uncorrectable Sector Count

Check https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf

198 Offline Uncorrectable

bad sign, on ssd mostly/only when number gets high

 


SSD specific


191 G-Sense_Error_Rate

Vibrations or shocks

231 SSD_Life_Left

Percentage of life left

232 Available Reserved Space

Number of physical erase cycles completed on the SSD as a percentage of the maximum physical erase cycles the drive is designed to endure.

Intel SSDs report the available reserved space as a percentage of the initial reserved space.

233 Media Wearout Indicator

Decreases down to 1, then start hoping. It might still last a while.

241 Lifetime_Writes_GiB

Amount of data written so far

SSD/nVME and USB adapters

Seemingly not supported: /dev/sdb: Unknown USB bridge [0x152d:0x2578 (0x117)] Bus 002 Device 002: ID 152d:2578 JMicron Technology Corp. / JMicron USA Technology Corp. USB to ATA/ATAPI Bridge Try:

smartctl -a -d sat /dev/sdb

With working smart support

rtl9210 like the DELOCK 64198

smartd: Failed SMART usage Attribute

Might be yelling about a disk that's already been replaced, try restarting smartd

Error: UNC at LBA = 0x00113cd8 = 1129688

Uncorrectable LBA error, throw away

FAQ

Wear level

Wear level on nvme

There's percentage used:

High = bad

and available spare:

Low = bad

List disks

smartctl --scan

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

meh

 

Device: /dev/bus/0 [megaraid_disk_09] [SAT], failed to read SMART Attribute Data

Controller probably doesn't allow smart, it also seems to happen when drive is hot spare on megamaid controller.

FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]

time to replace disk?


/dev/sdc: Unknown USB bridge

Smart over USB usually doesn't do much, try

smartctl -a -d scsi /dev/sdc

to get some information


Unavailable - device lacks SMART capability.

??


smartctl: unsupported scsi opcode

Try

smartctl -a -d scsi /dev/sdc


Non-medium error count

CRC errors, could be cable or controller