Revision as of 10:55, 29 January 2024

Disk monitoring

Links

Tools

smartctl
gsmartcontrol

Useful commands

enable smart

smartctl -i /dev/sda

check smart status

smartctl -a /dev/sda

check smart status of disk in raid array

Something like

smartctl -a /dev/bus/0 -d sat+megaraid,11

where 11 would be disk id (DID)

check error log

smartctl -l error /dev/sdb

Device: /dev/bus/0 [megaraid_disk_09] [SAT]

Try

smartctl --scan

Some codes and messages

SMART ASC/ASCQ error codes and messages

Meaning of pre-fail

Pending disk failure when equal to or lower than treshold value

LBA_of_first_error

https://www.smartmontools.org/wiki/BadBlockHowto

Device is: Not in smartctl database [for details use: -P showall]

Try

/usr/sbin/update-smart-drivedb

otherwise check out https://www.smartmontools.org/wiki/FAQ#MyATASATAdriveisnotinthesmartctlsmartddatabase

Uncorrectable Sector Count

Check https://medium.com/@satyeshukumar/how-to-fix-uncorrectable-sector-count-warning-5a38c56d3faf

198 Offline Uncorrectable

bad sign, on ssd mostly/only when number gets high

SSD specific

191 G-Sense_Error_Rate

Vibrations or shocks

231 SSD_Life_Left

Percentage of life left

232 Available Reserved Space

Number of physical erase cycles completed on the SSD as a percentage of the maximum physical erase cycles the drive is designed to endure.

Intel SSDs report the available reserved space as a percentage of the initial reserved space.

233 Media Wearout Indicator

Decreases down to 1, then start hoping. It might still last a while.

241 Lifetime_Writes_GiB

Amount of data written so far

SSD/nVME and USB adapters

Seemingly not supported: /dev/sdb: Unknown USB bridge [0x152d:0x2578 (0x117)] Bus 002 Device 002: ID 152d:2578 JMicron Technology Corp. / JMicron USA Technology Corp. USB to ATA/ATAPI Bridge Try:

smartctl -a -d sat /dev/sdb

smartd: Failed SMART usage Attribute

Might be yelling about a disk that's already been replaced, try restarting smartd

Error: UNC at LBA = 0x00113cd8 = 1129688

Uncorrectable LBA error, throw away

FAQ

Wear level

Wear level on nvme

There's percentage used:

High = bad

and available spare:

Low = bad

List disks

smartctl --scan

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

meh

Device: /dev/bus/0 [megaraid_disk_09] [SAT], failed to read SMART Attribute Data

Controller probably doesn't allow smart, it also seems to happen when drive is hot spare on megamaid controller.

FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]

time to replace disk?

/dev/sdc: Unknown USB bridge

Smart over USB usually doesn't do much, try

smartctl -a -d scsi /dev/sdc

to get some information

Unavailable - device lacks SMART capability.

??

smartctl: unsupported scsi opcode

Try

smartctl -a -d scsi /dev/sdc

Non-medium error count

CRC errors, could be cable or controller

@@ Line 55: / Line 55: @@
 *[https://www.ibm.com/docs/en/flashsystem-v7000u/1.5.2?topic=problems-smart-ascascq-error-codes-messages SMART ASC/ASCQ error codes and messages]
-==Meaning of '''pre-fail'''===
+==Meaning of '''pre-fail'''==
 Pending disk failure when equal to or lower than treshold value
@@ Line 90: / Line 90: @@
 *[https://lowendtalk.com/discussion/141041/is-my-server-ssd-disk-healty Is my SSD disk healthy?]
+=== 191 G-Sense_Error_Rate ===
+Vibrations or shocks
 === 231 SSD_Life_Left ===

Anonymous

Search

Smart: Difference between revisions