 |
 |
 |
 |
| Hardware & Laptops Help with your hardware, including laptop issues |

4th July 2011, 09:26 PM
|
|
Registered User
|
|
Join Date: Jan 2008
Posts: 92

|
|
|
A Hard Disk is reporting health problems [SOLVED]
Hello,
After a longgggggggggggg time the great and impressive improvement from fedora made me switch from Windows again. But there is a small issue which i am facing right now and its quite annoying while i click on any partitions i get A Hard Disk is reporting health problems and below is the logs from smartmontools, please if someone can help.
Quote:
Firmware Version: 0084001C
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 3f
Local Time is: Tue Jul 5 00:02:48 2011 GST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 783) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 111) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 161318
2 Throughput_Performance 0x0005 100 100 030 Pre-fail Offline - 34799616
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 2309
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always - 5 (1995, 5)
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 3266
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 7327
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2057
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 113
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 98658
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 39 (Min/Max 17/51)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 13402
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 5 (5, 15753)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 098 098 000 Old_age Offline - 4
199 UDMA_CRC_Error_Count 0x003e 200 253 000 Old_age Always - 2
200 Multi_Zone_Error_Rate 0x000f 100 100 060 Pre-fail Always - 22780
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 433724588278
240 Head_Flying_Hours 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 266 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 266 occurred at disk power-on lifetime: 6109 hours (254 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 0b c3 7d 03 40 Error: UNC at LBA = 0x00037dc3 = 228803
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 10 c8 60 86 32 40 00 00:00:28.941 READ FPDMA QUEUED
60 08 c0 88 85 32 40 00 00:00:28.941 READ FPDMA QUEUED
60 08 b8 48 9a 31 40 00 00:00:28.936 READ FPDMA QUEUED
60 08 b0 a0 66 32 40 00 00:00:28.936 READ FPDMA QUEUED
60 08 a8 d0 e5 32 40 00 00:00:28.929 READ FPDMA QUEUED
Error 265 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 bb 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 b8 28 e3 21 40 00 00:42:43.083 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:43.004 READ LOG EXT
60 70 b0 28 e3 21 40 00 00:42:38.737 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:38.659 READ LOG EXT
60 70 a8 28 e3 21 40 00 00:42:34.392 READ FPDMA QUEUED
Error 264 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 b3 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 b0 28 e3 21 40 00 00:42:38.737 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:38.659 READ LOG EXT
60 70 a8 28 e3 21 40 00 00:42:34.392 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:34.314 READ LOG EXT
60 70 a0 28 e3 21 40 00 00:42:30.047 READ FPDMA QUEUED
Error 263 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 ab 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 a8 28 e3 21 40 00 00:42:34.392 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:34.314 READ LOG EXT
60 70 a0 28 e3 21 40 00 00:42:30.047 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:29.969 READ LOG EXT
60 70 98 28 e3 21 40 00 00:42:25.691 READ FPDMA QUEUED
Error 262 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 a3 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 a0 28 e3 21 40 00 00:42:30.047 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:29.969 READ LOG EXT
60 70 98 28 e3 21 40 00 00:42:25.691 READ FPDMA QUEUED
60 10 90 28 1f 22 40 00 00:42:25.691 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:25.612 READ LOG EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 7293 -
# 2 Short offline Completed: read failure 90% 983 136431830
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):.
|
Last edited by ibrahim52; 13th July 2011 at 08:57 AM.
|

4th July 2011, 09:56 PM
|
 |
Registered User
|
|
Join Date: Jun 2004
Location: Laurel, MD USA
Posts: 5,448

|
|
|
Re: A Hard Disk is reporting health problems
Yeah, your disk is just starting to have errors and die.
Your disk has resorted to ECC error correction 13402 times, that's too high for a disk to be considered healthy. I guess it's not quite bad enough for the PASSED overall rating to go to FAILED.
For comparison, my laptop has half the power up time (3606 hrs to your 7327 hrs) of yours
and my ECC correction value is only 47 and I have 0 Offline_Uncorrectable events while your disk has 4.
You should back up to a good drive soon and try to replace it.
Last edited by marko; 4th July 2011 at 10:06 PM.
|

5th July 2011, 12:27 AM
|
|
Registered User
|
|
Join Date: Feb 2006
Posts: 780

|
|
|
Re: A Hard Disk is reporting health problems
Quote:
Originally Posted by marko
Yeah, your disk is just starting to have errors and die.
Your disk has resorted to ECC error correction 13402 times, that's too high for a disk to be considered healthy. I guess it's not quite bad enough for the PASSED overall rating to go to FAILED.
For comparison, my laptop has half the power up time (3606 hrs to your 7327 hrs) of yours
and my ECC correction value is only 47 and I have 0 Offline_Uncorrectable events while your disk has 4.
|
How to interpret the ECC value depends on the manufacturer and model, and there is no simple way to derive the health of the disk from the ECC count. For comparison, I have several disks which have hundreds of millions of ECC errors and are otherwise fine. In any case, ECC indicates that the detected errors have been corrected.
What's important to follow is whether any of the smart attributes is going down or have dropped under the threshold value, which doesn't seem to be the case. The Offline_Uncorrectable is something of importance because it's saying there's still errors on the disk. Raw_Read_Error_Rate isn't very promising, but what's not very nice either way is the UDMA_CRC_Error_Count which would seem to be suggesting a(n on-board) controller situation. This could have to do with the hardware or perhaps the large temperature range (from 17 to 51 C) under which the drive is operating.
Running a long smartctl test may reveal more. Ultimately, the data on the disk should be backed-up elsewhere after which a (data-destructive) badblocks should be run on the entire disk. This should allow the disk to reallocate sectors or otherwise bring the offline_uncorrectable count to 0. If it's a single disk system then you could narrow down the treatment and avoid having to destroy all the data with badblocks but it takes some more effort.
Better cooling may be appreciated by your disk.
David
|

5th July 2011, 06:09 AM
|
|
Registered User
|
|
Join Date: Jan 2008
Posts: 92

|
|
|
Re: A Hard Disk is reporting health problems
Perhaps, earlier i had Ubuntu on the same partition as i am using Fedora right now, i did not remove Ubuntu properly instead i went to Windows Disk manager and deleted the partition and using the "free space" i had installed Fedora, i don't know if that's the reason of showing HDD errors ?
To be really honest, my laptop recieves enough cooling in whole day to keep the HDD running in smooth way and no doubt, i never had any issues with it yet but one thing which i would like to ask is why only Fedora, why don't Ubuntu or Windows shows some kind of errors for my hard drive ?
I have been asked to visit the link below and run the diagnostic tool by fujitsu as my HDD belongs to fujitsu. I am going to boot into Windows and run the application for testing and will post the results here.
|

5th July 2011, 09:15 AM
|
|
Registered User
|
|
Join Date: Feb 2006
Posts: 780

|
|
|
Re: A Hard Disk is reporting health problems
Quote:
Originally Posted by ibrahim52
Perhaps, earlier i had Ubuntu on the same partition as i am using Fedora right now, i did not remove Ubuntu properly instead i went to Windows Disk manager and deleted the partition and using the "free space" i had installed Fedora, i don't know if that's the reason of showing HDD errors ?
|
The errors are physical errors which are otherwise independent of the particular OS.
Quote:
Originally Posted by ibrahim52
To be really honest, my laptop recieves enough cooling in whole day to keep the HDD running in smooth way and no doubt, i never had any issues with it yet but one thing which i would like to ask is why only Fedora, why don't Ubuntu or Windows shows some kind of errors for my hard drive ?
|
Probably because the systems which aren't reporting an error aren't monitoring or else are ignoring the errors.
Quote:
Originally Posted by ibrahim52
I have been asked to visit the link below and run the diagnostic tool by fujitsu as my HDD belongs to fujitsu. I am going to boot into Windows and run the application for testing and will post the results here.
|
Most people I come across on Windows don't run any kind of smart monitoring tool. I can imagine that the diagnostic tool will consult the smart attributes, amongst other tests. Good luck.
David
|

5th July 2011, 10:51 AM
|
|
Registered User
|
|
Join Date: Jan 2008
Posts: 92

|
|
|
Re: A Hard Disk is reporting health problems
I am not much into Hardware testing, but is there anyway i can repair this or stop the errors being appeared all the time regarding the Hard Drive in Fedora, currently the extended test is going on through Fujitsu application in Windows. Would take sometime and ill post the logs here. Thanks for the DETAILED help , i really appreciate the EXPERTS support and proud to be a FEDORA user
|

5th July 2011, 11:15 AM
|
|
Registered User
|
|
Join Date: Jan 2006
Posts: 2,769

|
|
|
Re: A Hard Disk is reporting health problems
You can suppress the warning. There is an application called Disk Utility (palimpsest) open that, then select the drive on the left pane. The click on Smart Data on the right pane. There is a Don't warn if the disk is failing checkbox there.
|

5th July 2011, 10:29 PM
|
|
Registered User
|
|
Join Date: Jan 2008
Posts: 92

|
|
|
Re: A Hard Disk is reporting health problems
Thanks JEO and by the way below are the results of the diagnostic tool i used from fujitsu on Windows and its PASSED. Is there anyways i can repair the errors through any bootable to not to appear on Fedora now ?
Model Name : FUJITSU MJA2250BH G2
Serial No. : K96QT
Firmware : 0084001C
Total LBA : 1D1C5970 h
Test Name : Extended Test
Result : PASS
Test Time : 15:09:56, July 05, 2011
|

6th July 2011, 11:25 PM
|
|
Registered User
|
|
Join Date: Jan 2008
Posts: 92

|
|
|
Re: A Hard Disk is reporting health problems
Well i have disabled the DISK FAILING NOTIFICATIONS. But there is no way i can fix this issue , because i don't face any kind of Slow response or boot.I am posting the error screenshot i am recieving through disk utility
|

7th July 2011, 07:31 AM
|
|
Registered User
|
|
Join Date: Jan 2006
Posts: 2,769

|
|
|
Re: A Hard Disk is reporting health problems
It shows 327,685 bad sectors. If that is accurate, the drive is failing. Research your model of harddisk and linux to see if there is a known problem with the smart data. Otherwise, the fix is to back up your data and put in a new drive and restore your data to it.
|

7th July 2011, 09:55 AM
|
|
Registered User
|
|
Join Date: Jul 2011
Posts: 14

|
|
|
Re: A Hard Disk is reporting health problems
Isn't a false positive on the Disk Utility a bug in F15? I don't see what the difference might be between the package in Fedora and Ubuntu, and since Ubuntu isn't reporting any problems here...
The guys on LAS had a rant about this.
|

7th July 2011, 12:08 PM
|
 |
Registered User
|
|
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,299

|
|
|
Re: A Hard Disk is reporting health problems
Quote:
Originally Posted by marko
Yeah, your disk is just starting to have errors and die.
|
Thanks for the alarmist misinformation - very helpful.
As David B' notes, interpreting SMART numbers is not for noobs and requires a "decoder ring".
http://en.wikipedia.org/wiki/S.M.A.R.T.
Raw value often do not mean what one might naively expect.
Quote:
Originally Posted by JEO
It shows 327,685 bad sectors. If that is accurate, the drive is failing. Research your model of harddisk and linux to see if there is a known problem with the smart data. Otherwise, the fix is to back up your data and put in a new drive and restore your data to it.
|
There is a VERY good reason to think it is not accurate. I have repeatedly and across many drive mfgrs seen spectacular errors in the SMART data parameters. It's just as buggy as system/mobo BIOS/ACPI implementations (most laptops and mobos FAIL any serious ACPI test). If the number you are seeing looks wrong - then it probably is.
327,685 = 0x0005,0005. That *looks* suspiciously like the SMART firmware has transmitted the '0005' 16 bit word twice. *Maybe* there are 5 bad sectors, very unlikely there are 327685 bad sectors, maybe there are none. The figure is HIGHLY suspect.
=====
Examples of SMART Firmware problem ...
I have an SSD drive showing ....
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 48361331755818
which means the drive must have been powered up for 5.5 Million years - since the Miocene epoch. However on closer examination ...
48361331755818 = 0x2BFC,0000,0B2A,
and 0x2BA = 698 hours which seems quite correct, the upper bits (0x2BFC) vary wildly on each read.
This SSD drive also increases it's "195 Hardware_ECC_Recovered" raw value at a constant 200 counts per second - EVEN WHEN THERE IS NO DISK ACTIVITY! So be VERY wary of interpreting the vendor specific items.
I had 3x Seagates in a RAID0 configuration a couple years ago. On the same day all three developed a "198 Uncorrectable Sector Count" on the same exactly (numeric) sector number. That's either a fantastic coincidence or a Firmware blunder. I strongly suspect some specific sequence of I/O or control ops confused the firmware into marking a bad sector.
=================
Lots of SMART firmware has bugs and the numbers should not be relied upon without examination.
To disable the log reports prevent the smartd daemon from running, or at least exclude that drive.
sudo chkconfig smartd off
or edit /etc/smart.conf to exclude the drive.
There is a very good paper by Google that considers SMART parameters and temperature to try to predict failure - and they find that these are not sufficient for a good predictive model.
http://labs.google.com/papers/disk_failures.pdf
They consider that certain SMART parameters have some limited predictive value,
5 Reallocated Sectors Count
196 Reallocation Event Count
197 Current Pending Sector Count
198 Off-Line Scan Uncorrectable Sector Count
So yes if you see ANY scan error or reallocations it significantly increases the probability of a future failure. Unless you have money to burn or a strong need for reliability this DOES NOT mean you should toss the drive, as the chicken-littles will tell you. The probability of failure is often still relatively low. It DOES mean you should always have an effective backup system in place, and a plan to restore your system in case of disk failure. I would replace the drive if you see an increasing trend pattern of repeated REAL(confirmed) errors.
You have to begin with an understanding that all rotating drives are trash error prone (56% of the google failures had none of these SMART sgnas). So DO worry about data integrity, it's a huge problem, but don't rely on SMART to predict an individual disk failure.
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
|

7th July 2011, 01:46 PM
|
|
Registered User
|
|
Join Date: Jan 2006
Posts: 2,769

|
|
|
Re: A Hard Disk is reporting health problems
I noticed that the threshold is 24 but the current normalized value and worst are 100. That does not agree with a huge bad sector count, since the current value or worst case value would have to decrease to the threshold value to really fail.
Last edited by JEO; 7th July 2011 at 01:49 PM.
|

13th July 2011, 08:57 AM
|
|
Registered User
|
|
Join Date: Jan 2008
Posts: 92

|
|
|
Re: A Hard Disk is reporting health problems
Thanks stevea, that was DETAILED and i read the whole thing.
Thanks JEO  that's what i don't face any kind of slow response in my hardrive while transferring the data or booting multiple OS, still as advised earlier i will start taking back up. Thanks to all
|

13th July 2011, 02:41 PM
|
|
Registered User
|
|
Join Date: Mar 2010
Posts: 87

|
|
|
Re: A Hard Disk is reporting health problems
I would suggest running long selftest
smartctl -test=long /dev/sda
(or what ever the device is)
Then check the smart log with smartctl -a /dev/sda
In any event, hard drives are quite cheap today. I would not put my data in risk if I'd get any errors from smart. Like for example reallocated sectors which you seem to have.
My previous drive started showing up reallocated sectors and soon became unusable.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
Similar Threads
|
| Thread |
Thread Starter |
Forum |
Replies |
Last Post |
|
Hard disk problems.
|
deranged58 |
Hardware & Laptops |
10 |
8th December 2005 11:37 PM |
|
Hard disk problems
|
lothario |
Hardware & Laptops |
1 |
18th September 2005 10:18 AM |
Current GMT-time: 18:50 (Sunday, 19-05-2013)
|
|
 |
 |
 |
 |
|
|