Imagine this: You suffered a power loss in your area while your computer was on. After the power is restored, you boot your computer only to find that Windows automatically ran autochk on your hard drive and it says you have bad sectors on the disk.
What are these bad sectors? Is this a sign your drive is going to bite the dust? Can these sectors be repaired? We have the answers to these questions and more in this two part article.
This first part will deal with the hardware aspects of the problem while the second will cover the software including the operating system, manufacturer tools, and third-party utilities.
What are sectors?
The terminology for hard drives originated with mechanical drives and our discussion benefits from a bit of historical background.
A mechanical hard drive consists of one or more aluminum or glass and ceramic platters coated with a magnetic material containing cobalt, sometimes with platinum and nickel. Each side has concentric rings where data will be stored called tracks. A stack of tracks across all platters is called a cylinder. Finally, each track is divided into arcs called sectors.
Each platter has two sides with an associated read/write head attached to a head stack assembly (HSA) that moves across the disk via an actuator mechanism. When the platters spin, it creates a “cushion” of air making the heads float 5 to 10 nanometers away from the platter so ideally there is no contact between the magnetic surface of the platters and the read/write heads. Older drives may have a float height of up to 100 nanometers. To imagine the scale, a sheet of paper is approximately 75,000 nanometers thick.
Each drive has restricted system area tracks that are not user-accessible. The drive controller stores information about the drive in this area, including both of the bad sector lists and spare sectors that are used during remapping. Some drives may also have spare sectors located at the end of each track.
Each sector on a drive is individually addressable which was originally done by referring to the cylinder, head, and sector (CHS) where the required data is stored. When a hard drive was installed in the computer, you needed to change BIOS settings to let it know the number of cylinders, heads, and sectors per track on the drive. These settings are collectively known as the drive geometry.
Later, the controller was moved from an add-in card attached to the motherboard to the drive itself. One of the things this allowed was the translation of a logical geometry of the drive to a different physical geometry. Two reasons this became important is that it provided a way to get around the addressing limitations of CHS and it allowed zoned bit recording (ZBR).
When looking at the drive layout diagram, the sectors at the outer edge of the drive are longer than the ones closer to the spindle. With a constant recording density, it means that there is wasted space along the outer edge of the drive that wasn’t used to store any data at all. With ZBR, groups of tracks would have the same layout with the zones closer to the outer edge having more sectors per track so there is less wasted space and more data stored per platter while keeping the same recording density.
In order to make it work with the then-current BIOS design, the disk controller would need to translate the logical geometry of the drive as entered in the BIOS to the physical geometry the drive was really using.
On today’s modern drives, the addressing is done using Logical Block Addressing (LBA) which is just a zero-based integer index starting at the first cylinder, first head, first sector and moving on sector-by-sector, head-by-head, cylinder-by-cylinder to the end of the drive.
Even though today’s Solid-State Drives (SSD) do not have a physical layout remotely resembling this, they still use the same interfaces and LBA addressing scheme.
Each sector has a specific layout as well. It contains a preamble, data, and an error correcting code (ECC).
The preamble contains information used by the disk controller including a gap between sectors, sync bits and timing alignment, and an address mark (the sector number, location, and status).
The data is the user data that is stored in the sector. Until recently, most drives stored 512 bytes of data per sector. Since 2010, most drives are Advanced Format (AF) 4K drives which use sector sizes of 4096 bytes. Some operating systems such as Windows Vista and 7 require special drivers and updated tools applied as a hotfix through Windows Update to be able to deal with these drives as boot devices. This hotfix is part of Service Pack 1 for Windows 7 and many AF drives come with drivers to enable their use on Windows XP.
The ECC is a mathematically-derived code based on the data stored in the sector which is used by the disk controller to detect if there is a problem with the data and allows the original data to be reconstructed. The number of bits that can be corrected is limited based on the specific algorithm used to generate the ECC which varies by manufacturer and can even differ among drives made by the same company.
What are bad sectors?
A bad sector is one that cannot be reliably read or written. There are two reasons this can happen. The first is physical damage to the recording medium or other types of problems resulting in uncorrectable read errors which may be a result of manufacturing defects, magnetic wear, the flash memory cell of an SSD may have worn out, or the read/write heads made contact with the platter damaging the magnetic coating.
All drives are pretty much guaranteed to ship with bad sectors. Old-timers may remember the days of entering the bad sectors the manufacturer had listed on the drive into the low-level formatting tool before being able to partition and format the drive with the operating system’s native tools.
Low-level formatting and consequent marking of bad or marginal sectors is now done at the factory at the end of the production process so the user no longer needs to worry about it. The locations of these sectors are kept in the first of two lists of bad sectors on the drive – the P-LIST or primary defect list. The hard drive electronics automatically ignore sectors on this list and they do not slow down drive access.
Over time, other sectors may begin to show problems. This may be due to a head crash, magnetic wear, and other issues. This second type of error is commonly called a soft error as, at least in its initial stages, the errors can be corrected with CRC and ECC mechanisms.
Once the errors on these sectors become uncorrectable or too unstable, they are added to the G-LIST or grown defect list. These will be automatically remapped to spare sectors on the drive. If the drive has spare sectors on the same track, they will be used first before remapping to a sector on a different track. Accessing remapped sectors slows the drive and the speed continues to drop as the G-LIST grows.
How do sectors get marked as ‘bad’?
In order to help prevent data loss, the hard drive controller looks for problems during its normal operation. In fact, the disk controller will do much of the work behind the scenes and never even let your operating system know anything untoward has happened.
Remember the error correcting code located in each sector? When the drive reads the sector data, it recomputes the ECC and compares it to the ECC stored in the sector. If they don’t match, it will attempt to use the ECC to reconstruct the corrupted data. If the amount of error is small and it can be corrected, it simply delivers the corrected data and increments the Self-Monitoring, Analysis and Reporting Technology (SMART) counter 195 (Hardware ECC Correction). If it cannot correct the error, it will increment SMART counter 198 (Offline Uncorrectable Sector Count) and counter 197 (Current Pending Sector Count) until a write is attempted to that sector.
Bad sectors are not reallocated until an attempt is made to write to the sector in order to preserve the possibility of data recovery via other methods. Once a write operation is attempted on a bad sector, the controller will allocate a new empty sector from the spare pool to replace the bad sector, the defect flag is updated to indicate the sector has been reallocated, and the G-LIST is updated. Any data in the original sector may be lost if a final attempt at reading the data fails. This is why any advanced recovery attempts must be made prior to writing to a suspected bad sector.
Now that we have taken a peek inside the drive to see what is happening behind the curtain, you have sufficient background to better understand how the operating system and other software will work with it.
In part two, we will look at the tools provided by the operating system, hard drive manufacturers, and third-parties you can use to help diagnose and deal with bad sectors. We will also look at tools used to monitor the overall health of the drive. With judicious use of these tools, you will easily see if bad sectors are presaging an imminent drive failure or if it is more likely you will have many years left with your beloved data.