Fedora Linux Support Community & Resources Center
  #1  
Old 4th July 2011, 09:26 PM
ibrahim52 Offline
Registered User
 
Join Date: Jan 2008
Posts: 92
linuxfirefox
A Hard Disk is reporting health problems [SOLVED]

Hello,
After a longgggggggggggg time the great and impressive improvement from fedora made me switch from Windows again. But there is a small issue which i am facing right now and its quite annoying while i click on any partitions i get A Hard Disk is reporting health problems and below is the logs from smartmontools, please if someone can help.

Quote:
Firmware Version: 0084001C
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 3f
Local Time is: Tue Jul 5 00:02:48 2011 GST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 783) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 111) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 161318
2 Throughput_Performance 0x0005 100 100 030 Pre-fail Offline - 34799616
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 2309
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always - 5 (1995, 5)
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 3266
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 7327
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2057
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 113
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 98658
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 39 (Min/Max 17/51)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 13402
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 5 (5, 15753)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 098 098 000 Old_age Offline - 4
199 UDMA_CRC_Error_Count 0x003e 200 253 000 Old_age Always - 2
200 Multi_Zone_Error_Rate 0x000f 100 100 060 Pre-fail Always - 22780
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 433724588278
240 Head_Flying_Hours 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 266 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 266 occurred at disk power-on lifetime: 6109 hours (254 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 0b c3 7d 03 40 Error: UNC at LBA = 0x00037dc3 = 228803

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 10 c8 60 86 32 40 00 00:00:28.941 READ FPDMA QUEUED
60 08 c0 88 85 32 40 00 00:00:28.941 READ FPDMA QUEUED
60 08 b8 48 9a 31 40 00 00:00:28.936 READ FPDMA QUEUED
60 08 b0 a0 66 32 40 00 00:00:28.936 READ FPDMA QUEUED
60 08 a8 d0 e5 32 40 00 00:00:28.929 READ FPDMA QUEUED

Error 265 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 bb 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 b8 28 e3 21 40 00 00:42:43.083 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:43.004 READ LOG EXT
60 70 b0 28 e3 21 40 00 00:42:38.737 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:38.659 READ LOG EXT
60 70 a8 28 e3 21 40 00 00:42:34.392 READ FPDMA QUEUED

Error 264 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 b3 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 b0 28 e3 21 40 00 00:42:38.737 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:38.659 READ LOG EXT
60 70 a8 28 e3 21 40 00 00:42:34.392 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:34.314 READ LOG EXT
60 70 a0 28 e3 21 40 00 00:42:30.047 READ FPDMA QUEUED

Error 263 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 ab 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 a8 28 e3 21 40 00 00:42:34.392 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:34.314 READ LOG EXT
60 70 a0 28 e3 21 40 00 00:42:30.047 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:29.969 READ LOG EXT
60 70 98 28 e3 21 40 00 00:42:25.691 READ FPDMA QUEUED

Error 262 occurred at disk power-on lifetime: 6046 hours (251 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 a3 49 e3 21 40 Error: UNC at LBA = 0x0021e349 = 2220873

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 a0 28 e3 21 40 00 00:42:30.047 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:29.969 READ LOG EXT
60 70 98 28 e3 21 40 00 00:42:25.691 READ FPDMA QUEUED
60 10 90 28 1f 22 40 00 00:42:25.691 READ FPDMA QUEUED
2f 00 01 10 00 00 40 00 00:42:25.612 READ LOG EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 7293 -
# 2 Short offline Completed: read failure 90% 983 136431830

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):.

Last edited by ibrahim52; 13th July 2011 at 08:57 AM.
Reply With Quote
  #2  
Old 4th July 2011, 09:56 PM
marko's Avatar
marko Offline
Registered User
 
Join Date: Jun 2004
Location: Laurel, MD USA
Posts: 5,488
linuxfirefox
Re: A Hard Disk is reporting health problems

Yeah, your disk is just starting to have errors and die.
Your disk has resorted to ECC error correction 13402 times, that's too high for a disk to be considered healthy. I guess it's not quite bad enough for the PASSED overall rating to go to FAILED.

For comparison, my laptop has half the power up time (3606 hrs to your 7327 hrs) of yours
and my ECC correction value is only 47 and I have 0 Offline_Uncorrectable events while your disk has 4.

You should back up to a good drive soon and try to replace it.

Last edited by marko; 4th July 2011 at 10:06 PM.
Reply With Quote
  #3  
Old 5th July 2011, 12:27 AM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
linuxfedorafirefox
Re: A Hard Disk is reporting health problems

Quote:
Originally Posted by marko View Post
Yeah, your disk is just starting to have errors and die.
Your disk has resorted to ECC error correction 13402 times, that's too high for a disk to be considered healthy. I guess it's not quite bad enough for the PASSED overall rating to go to FAILED.

For comparison, my laptop has half the power up time (3606 hrs to your 7327 hrs) of yours
and my ECC correction value is only 47 and I have 0 Offline_Uncorrectable events while your disk has 4.
How to interpret the ECC value depends on the manufacturer and model, and there is no simple way to derive the health of the disk from the ECC count. For comparison, I have several disks which have hundreds of millions of ECC errors and are otherwise fine. In any case, ECC indicates that the detected errors have been corrected.

What's important to follow is whether any of the smart attributes is going down or have dropped under the threshold value, which doesn't seem to be the case. The Offline_Uncorrectable is something of importance because it's saying there's still errors on the disk. Raw_Read_Error_Rate isn't very promising, but what's not very nice either way is the UDMA_CRC_Error_Count which would seem to be suggesting a(n on-board) controller situation. This could have to do with the hardware or perhaps the large temperature range (from 17 to 51 C) under which the drive is operating.

Running a long smartctl test may reveal more. Ultimately, the data on the disk should be backed-up elsewhere after which a (data-destructive) badblocks should be run on the entire disk. This should allow the disk to reallocate sectors or otherwise bring the offline_uncorrectable count to 0. If it's a single disk system then you could narrow down the treatment and avoid having to destroy all the data with badblocks but it takes some more effort.

Better cooling may be appreciated by your disk.

David
Reply With Quote
  #4  
Old 5th July 2011, 06:09 AM
ibrahim52 Offline
Registered User
 
Join Date: Jan 2008
Posts: 92
linuxfirefox
Re: A Hard Disk is reporting health problems

Perhaps, earlier i had Ubuntu on the same partition as i am using Fedora right now, i did not remove Ubuntu properly instead i went to Windows Disk manager and deleted the partition and using the "free space" i had installed Fedora, i don't know if that's the reason of showing HDD errors ?

To be really honest, my laptop recieves enough cooling in whole day to keep the HDD running in smooth way and no doubt, i never had any issues with it yet but one thing which i would like to ask is why only Fedora, why don't Ubuntu or Windows shows some kind of errors for my hard drive ?

I have been asked to visit the link below and run the diagnostic tool by fujitsu as my HDD belongs to fujitsu. I am going to boot into Windows and run the application for testing and will post the results here.
Reply With Quote
  #5  
Old 5th July 2011, 09:15 AM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
linuxfedorafirefox
Re: A Hard Disk is reporting health problems

Quote:
Originally Posted by ibrahim52 View Post
Perhaps, earlier i had Ubuntu on the same partition as i am using Fedora right now, i did not remove Ubuntu properly instead i went to Windows Disk manager and deleted the partition and using the "free space" i had installed Fedora, i don't know if that's the reason of showing HDD errors ?
The errors are physical errors which are otherwise independent of the particular OS.

Quote:
Originally Posted by ibrahim52 View Post
To be really honest, my laptop recieves enough cooling in whole day to keep the HDD running in smooth way and no doubt, i never had any issues with it yet but one thing which i would like to ask is why only Fedora, why don't Ubuntu or Windows shows some kind of errors for my hard drive ?
Probably because the systems which aren't reporting an error aren't monitoring or else are ignoring the errors.

Quote:
Originally Posted by ibrahim52 View Post
I have been asked to visit the link below and run the diagnostic tool by fujitsu as my HDD belongs to fujitsu. I am going to boot into Windows and run the application for testing and will post the results here.
Most people I come across on Windows don't run any kind of smart monitoring tool. I can imagine that the diagnostic tool will consult the smart attributes, amongst other tests. Good luck.

David
Reply With Quote
  #6  
Old 5th July 2011, 10:51 AM
ibrahim52 Offline
Registered User
 
Join Date: Jan 2008
Posts: 92
windows_xp_2003firefox
Re: A Hard Disk is reporting health problems

I am not much into Hardware testing, but is there anyway i can repair this or stop the errors being appeared all the time regarding the Hard Drive in Fedora, currently the extended test is going on through Fujitsu application in Windows. Would take sometime and ill post the logs here. Thanks for the DETAILED help , i really appreciate the EXPERTS support and proud to be a FEDORA user
Reply With Quote
  #7  
Old 5th July 2011, 11:15 AM
JEO Offline
Registered User
 
Join Date: Jan 2006
Posts: 2,769
linuxfedorafirefox
Re: A Hard Disk is reporting health problems

You can suppress the warning. There is an application called Disk Utility (palimpsest) open that, then select the drive on the left pane. The click on Smart Data on the right pane. There is a Don't warn if the disk is failing checkbox there.
Reply With Quote
  #8  
Old 5th July 2011, 10:29 PM
ibrahim52 Offline
Registered User
 
Join Date: Jan 2008
Posts: 92
windows_7firefox
Re: A Hard Disk is reporting health problems

Thanks JEO and by the way below are the results of the diagnostic tool i used from fujitsu on Windows and its PASSED. Is there anyways i can repair the errors through any bootable to not to appear on Fedora now ?

Model Name : FUJITSU MJA2250BH G2
Serial No. : K96QT
Firmware : 0084001C
Total LBA : 1D1C5970 h

Test Name : Extended Test
Result : PASS
Test Time : 15:09:56, July 05, 2011
Reply With Quote
  #9  
Old 6th July 2011, 11:25 PM
ibrahim52 Offline
Registered User
 
Join Date: Jan 2008
Posts: 92
linuxfirefox
Re: A Hard Disk is reporting health problems

Well i have disabled the DISK FAILING NOTIFICATIONS. But there is no way i can fix this issue , because i don't face any kind of Slow response or boot.I am posting the error screenshot i am recieving through disk utility
Attached Thumbnails
Click image for larger version

Name:	Screenshot.resized.png
Views:	264
Size:	139.1 KB
ID:	21397  
Reply With Quote
  #10  
Old 7th July 2011, 07:31 AM
JEO Offline
Registered User
 
Join Date: Jan 2006
Posts: 2,769
linuxfedorafirefox
Re: A Hard Disk is reporting health problems

It shows 327,685 bad sectors. If that is accurate, the drive is failing. Research your model of harddisk and linux to see if there is a known problem with the smart data. Otherwise, the fix is to back up your data and put in a new drive and restore your data to it.
Reply With Quote
  #11  
Old 7th July 2011, 09:55 AM
elkoraco Offline
Registered User
 
Join Date: Jul 2011
Posts: 14
linuxchrome
Re: A Hard Disk is reporting health problems

Isn't a false positive on the Disk Utility a bug in F15? I don't see what the difference might be between the package in Fedora and Ubuntu, and since Ubuntu isn't reporting any problems here...

The guys on LAS had a rant about this.
Reply With Quote
  #12  
Old 7th July 2011, 12:08 PM
stevea's Avatar
stevea Online
Registered User
 
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,346
linuxfedorafirefox
Re: A Hard Disk is reporting health problems

Quote:
Originally Posted by marko View Post
Yeah, your disk is just starting to have errors and die.
Thanks for the alarmist misinformation - very helpful.

As David B' notes, interpreting SMART numbers is not for noobs and requires a "decoder ring".
http://en.wikipedia.org/wiki/S.M.A.R.T.
Raw value often do not mean what one might naively expect.

Quote:
Originally Posted by JEO View Post
It shows 327,685 bad sectors. If that is accurate, the drive is failing. Research your model of harddisk and linux to see if there is a known problem with the smart data. Otherwise, the fix is to back up your data and put in a new drive and restore your data to it.
There is a VERY good reason to think it is not accurate. I have repeatedly and across many drive mfgrs seen spectacular errors in the SMART data parameters. It's just as buggy as system/mobo BIOS/ACPI implementations (most laptops and mobos FAIL any serious ACPI test). If the number you are seeing looks wrong - then it probably is.

327,685 = 0x0005,0005. That *looks* suspiciously like the SMART firmware has transmitted the '0005' 16 bit word twice. *Maybe* there are 5 bad sectors, very unlikely there are 327685 bad sectors, maybe there are none. The figure is HIGHLY suspect.

=====
Examples of SMART Firmware problem ...

I have an SSD drive showing ....
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 48361331755818
which means the drive must have been powered up for 5.5 Million years - since the Miocene epoch. However on closer examination ...
48361331755818 = 0x2BFC,0000,0B2A,
and 0x2BA = 698 hours which seems quite correct, the upper bits (0x2BFC) vary wildly on each read.


This SSD drive also increases it's "195 Hardware_ECC_Recovered" raw value at a constant 200 counts per second - EVEN WHEN THERE IS NO DISK ACTIVITY! So be VERY wary of interpreting the vendor specific items.


I had 3x Seagates in a RAID0 configuration a couple years ago. On the same day all three developed a "198 Uncorrectable Sector Count" on the same exactly (numeric) sector number. That's either a fantastic coincidence or a Firmware blunder. I strongly suspect some specific sequence of I/O or control ops confused the firmware into marking a bad sector.

=================

Lots of SMART firmware has bugs and the numbers should not be relied upon without examination.

To disable the log reports prevent the smartd daemon from running, or at least exclude that drive.
sudo chkconfig smartd off
or edit /etc/smart.conf to exclude the drive.

There is a very good paper by Google that considers SMART parameters and temperature to try to predict failure - and they find that these are not sufficient for a good predictive model.

http://labs.google.com/papers/disk_failures.pdf

They consider that certain SMART parameters have some limited predictive value,
5 Reallocated Sectors Count
196 Reallocation Event Count
197 Current Pending Sector Count
198 Off-Line Scan Uncorrectable Sector Count

So yes if you see ANY scan error or reallocations it significantly increases the probability of a future failure. Unless you have money to burn or a strong need for reliability this DOES NOT mean you should toss the drive, as the chicken-littles will tell you. The probability of failure is often still relatively low. It DOES mean you should always have an effective backup system in place, and a plan to restore your system in case of disk failure. I would replace the drive if you see an increasing trend pattern of repeated REAL(confirmed) errors.

You have to begin with an understanding that all rotating drives are trash error prone (56% of the google failures had none of these SMART sgnas). So DO worry about data integrity, it's a huge problem, but don't rely on SMART to predict an individual disk failure.
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
Reply With Quote
  #13  
Old 7th July 2011, 01:46 PM
JEO Offline
Registered User
 
Join Date: Jan 2006
Posts: 2,769
linuxfedorafirefox
Re: A Hard Disk is reporting health problems

I noticed that the threshold is 24 but the current normalized value and worst are 100. That does not agree with a huge bad sector count, since the current value or worst case value would have to decrease to the threshold value to really fail.

Last edited by JEO; 7th July 2011 at 01:49 PM.
Reply With Quote
  #14  
Old 13th July 2011, 08:57 AM
ibrahim52 Offline
Registered User
 
Join Date: Jan 2008
Posts: 92
windows_7firefox
Re: A Hard Disk is reporting health problems

Thanks stevea, that was DETAILED and i read the whole thing.

Thanks JEO that's what i don't face any kind of slow response in my hardrive while transferring the data or booting multiple OS, still as advised earlier i will start taking back up. Thanks to all
Reply With Quote
  #15  
Old 13th July 2011, 02:41 PM
japafi Offline
Registered User
 
Join Date: Mar 2010
Posts: 87
linuxfirefox
Re: A Hard Disk is reporting health problems

I would suggest running long selftest
smartctl -test=long /dev/sda
(or what ever the device is)

Then check the smart log with smartctl -a /dev/sda

In any event, hard drives are quite cheap today. I would not put my data in risk if I'd get any errors from smart. Like for example reallocated sectors which you seem to have.

My previous drive started showing up reallocated sectors and soon became unusable.
Reply With Quote
Reply

Tags
disk, hard, health, problems, reporting

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hard disk problems. deranged58 Hardware & Laptops 10 8th December 2005 11:37 PM
Hard disk problems lothario Hardware & Laptops 1 18th September 2005 10:18 AM


Current GMT-time: 11:46 (Wednesday, 19-06-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat