Using the SMART data from HDD
I have an Acronis disk drive monitor product that has reported some HDD SMART errors - current pending sector count (12) and uncorrectable sector count (12) on one of my system hard drives. These detections occurred several days after the most recent read/write action to the drives. See attached screen shot: ADM error screen 150913.jpg
The affected drive is a single WD Caviar Blue (part no in the screen shot) 1Tb HDD barely 12 months old, and is organised with 2 logical drive segments, both formatted for NTFS in 4Kb clusters, and both are currently 98% freespace. I ran Windows 7 Diskchk on both of them, and queried Microsoft about interpreting the results.
And as a result of that, this query is about ADM's detection of SMART data and using it.
First, the diskchk results. Diskchk was run on both logical drives under disk director 11 with options to 1) fix found errors and 2) try to fix found bad sectors enabled.
In both cases,
- stage 1 file verification reported completion, no errors.
- stage 2 verify indexes reported completion, no errors.
- stage 3 security descriptors files and journal both reported completion no errors for either one.
- stage 4 file data on (thousands) of files, file data verification reported completion, no errors
- stage 5 freespace verification reported completion
The summary included a report of 16Kb bad sectors (one drive) and 32Kb bad sectors (other drive).
A re-boot after the checks were completed still shows the drive SMART data unchanged, and the screen shot is of the drive condition after the disk checks.
Second, the Microsoft response was essentially that:
- if the stages proceed without reported errors there was nothing detected at that level
- if there was recovered data from marginal clusters, it would be in the root with a .chk type - and there were none.
Given a) this advice and results, and b) the large unused space in each logical drive, it seems most likely that any issues were with clusters in the unused space.
Third, the queries about SMART information.
1. How is it generated on drive, and then collected, eg by ADM?
Commentary: I have 2 160gb drives on this same system that hold a) the OS - C: drive, and b) applications on different designated logical drives (but not data those apps work on). Twice in maintenance various tools have indicated that these drives should be replaced because of detected CRC errors - but they keep working. I've discovered (accidentally) that these CRC errors get generated when the OS crashes - the blue screen of death, requiring a forced reset to get the PC going again. Usually, one instance for each drive for a system crash - the OS died, and the application in use also died. So it cannot be said that these errors indicated an incipient drive failure. Thus, what other errors might be similar, as in the errors in this case (see screen shot).
2. there seems no provision to delete errors that seem erroneous. Specifically, an ability to multi-stage the detected errors - ie a total count and a 'since last disk checks' count. The latter being able to be reset on a no error or successful disk check condition.
a) Does the SMART specification even provide for things like false detections and on-going condition monitoring of this sort?
b) If it does how does ADM use it?
3. Regardless of where the bad clusters reported were detected - filespace or freespace - without any specific disk read or write activity to trigger the fault detections, how would the error conditions be detected on drive?