Archive 20Archive 22Archive 23Archive 24Archive 25Archive 26

Wording

@Dsimic: In this article both "an HDD" and "a HDD" have been used. Correct me if I'm wrong, but isn't "a HDD" supposed to be correct? Because I don't see any rationale for using "an HDD". Article begins with calling it "A hard disk" but changes to "an HDD" over time. Isn't this grammatically incorrect? This was the main reason for removal of "an" in my previous edit. However I'm having second thoughts now because in your edit summary you said "Nah, in this case it's either "An HDD retains..." or "HDDs retain...""--ChamithN (talk) 13:10, 26 July 2015 (UTC)

Hello here! :) Actually, "a hard disk drive" and "an HDD" are both correct because the selection between "a" and "an" is based on how the following word pronounces, not how it's written (as we know, "HDD" is pronounced as aitch dee dee, which starts with a vowel sound). Another common example is "an hour", in which "hour" is also pronounced with a starting vowel sound. Please see this page, for example, for further information. — Dsimic (talk | contribs) 13:26, 26 July 2015 (UTC)
Ah of course! That's not how I'm used to pronounce it which is why I was confused initially. Whenever I see the abbreviation "HDD" I expand it first in my mind and then read it. So aitch dee dee situation didn't occur to me in this case. Thank you for your substantial clarification.--ChamithN (talk) 13:34, 26 July 2015 (UTC)
You're welcome. — Dsimic (talk | contribs) 13:42, 26 July 2015 (UTC)
Note that in British English, HDD is "haitch dee dee", thus "a HDD". This may be part of the confusion. But this article is American English, so "an HDD" is best. --A D Monroe III (talk) 21:40, 17 August 2015 (UTC)

Object-based Storage Device (OSD)

Does not mention object storage, cf. block storage. — Preceding unsigned comment added by 66.155.23.67 (talk) 08:01, 15 October 2015 (UTC)

Now it does. IMHO, it shouldn't be integrated futher into the article, because the underlying HDD technology, which the article primarily deals with, remains the same. — Dsimic (talk | contribs) 08:10, 16 October 2015 (UTC)

Drive capacity not correctly defined

While referring to this article, I noticed an omission. In modern drives capacity is actually specified not by the raw capacity of the drive in bytes, but by the number of blocks of size x the drive exports. The mapping between drive size and number of blocks is fond in an IDEMA standard document, LBA Count for Disk Drives Standard . So, for example, a 500GB drive with 512 byte block size will report that it has 976,773,168 logical blocks. There will actually be more capacity than this in the drive to allow for block revectoring and other error handling, but to be a 500GB drive, this is the number of blocks the drive must report.

I'm not certain exactly where this should go in the article, but it should be in there. My experience with this is as a firmware developer at Western Digital, where this was the standard used for the number of blocks reported by a drive. MisterSpecific (talk) 20:08, 14 February 2016 (UTC)

Hello! As we know, it's pretty much about the advertised vs. reported capacity of HDDs; the former is primarily advertised in gigabytes or terabytes, while the latter is reported in LBA blocks. Please have a look at the Hard disk drive § Calculation section, which already describes the use of logical blocks; it also describes the presence of spare areas used for remapping bad sectors. — Dsimic (talk | contribs) 20:28, 14 February 2016 (UTC)
Sorry @Dsimic:, but I don't think it's about that. I've been buying hard drives since they were measured in tens of megabytes and as far as SCSI, ATA, and SATA drives go, I can't remember ever seeing the actual usable capacity (number of addressable blocks x block size) not exceed the drive's advertised capacity, albeit by a negligible amount. I'm traveling atm so I have few drives to cite as example... but right now this machine has a "1 TB" and a "2 TB" drive attached; the latter reports 3,907,029,168 blocks; at 512 bytes each this is 2,000,398,934,016 bytes. The advertised and reported capacities are identical within 0.02 percent, and the reported (usable) is the larger, though by a negligible amount. Neither of these is the "raw capacity" and this difference does not reflect the space reserved for spare sectors and the like.
What @MisterSpecific: is referring to (I believe) is the capacity of the disk as seen by its internal firmware. This does include a number of blocks that are not addressable from the outside, but are reserved for use as spares and for other "internal" purposes. I do not beleive that, at least in SCSI, ATA, and later times, hard drive makers have ever advertised drives in terms of this internal capacity.
iirc there was a time in the past (when I was buying—and low-level-formatting—products like Fujitsu Eagles and CDC Wrens) when drives' "raw" capacities were regularly reported by manufacturers. But that was because the drives (in some variants, anyway) didn't have an internal controller that enforced formatting or implemented sparing; that was up to the external controller's firmware and/or the host OS. Thus the end-user usable capacity, the number of addressable blocks, actually depended on the choice of low-level format (like MFM vs RLL), the sector layout, etc.
With the advent of SCSI and, slightly later, ATA drives, this went away; the drives came low-level-formatted from the factory and the end user could not re-do this, let alone change it. At that point the "raw" capacity was essentially meaningless to the end user other than as a measure of how much "sparing" capacity the drive had and was no longer advertised widely (or even in the fine print). Jeh (talk) 21:27, 14 February 2016 (UTC)
Jeh, I'd say that we're actually on the same page, and the only issue is that I erroneously used "vs." in my initial reply – using "along" instead would fit much better up there. In other words, it's all about expressing essentially the same drive capacity both in advertised and reported units. Of course, drive firmware sees much more, not only the spare sectors, but also the ECC data, service areas, etc. You're right that long time ago one could see and use the "raw" capacity of a drive, even the space intended for ECC data could be repurposed and used for storing user-accessible data. — Dsimic (talk | contribs) 21:41, 14 February 2016 (UTC)
I'll note that there was an edit conflict with the post above that moots some of what I say below, but not all of it, so I'll post it anyway. Ignore if you want. This last by @Dsimic: gets it right, but there are subtleties to the number that matter, discussed below. MisterSpecific (talk) 22:25, 14 February 2016 (UTC)
Well, @Jeh:, the IDEMA spec does in fact have the calculated reported number exceeding the LBA number that you'd calculate for size nTB by 0.02% as you note. But that's not the point here; the point is that the software that looks at drive size queries the number of blocks (LBAs) the drive says it contains (through the SATA or SCIS command I forget at the moment), and this number is what the drive reports. The actual drive itself may have more blocks on the platters, but it is of size nTB as reported by this calculated number and that number is the usable capacity on the drive, regardless of internal capacity. In the drives I'm familiar with, you cannot exceed that LBA, even if the drive does have more blocks on the surface. In fact, when manufacturers build drives, they use the same mechanicals for drives of different sizes, and one of the ways they reduce the drive size (say, from 3TB to 2TB) is to simply set up the firmware to report the smaller block count. There's more capacity on the surface, but you bought a 2TB drive, so that's the number it reports, and that's the largest LBA you can write to. (They do this to reduce manufacturing costs or balance inventory; they also do things like leave out platters or depopulate heads to reduce size).
So @Jeh:, it's actually the inverse of what you're speculating; it's what the outside world sees as the drive LBA limit, regardless of what the firmware sees internally (which is presumably more than that number).
As another example of what I'm trying to get across, if you read the doc you'll see that drives that support Protection Information actually use 520 byte blocks, yet report the same capacity as a 512 byte block drive (through the number of LBAs) since the additional RAW capacity isn't available to the end user. This number is not about the internal capacity other than as a lower limit for a new drive.
And, to respond to @Dsimic:, you are right in that this about reported vs. actual capacity; this is the standard for reporting the capacity in LBAs for a given advertised drive size. At the end of the day, it's this calculated and reported number that defines the size of the drive, not the raw capacity or potential capacity or any other number. Drive manufacturers use this number so that when they stamp 500GB on the drive, users know it will report 976,773,168 LBAs available. That's what makes it a 500GB drive.
At the risk of confusing the discussion, if the drive has to do bad block replacement, it does it by pointing the LBA to a different physical block. The number of blocks reported as capacity by the drive doesn't change. The same is true if the number of available physical blocks drops below this number; you'll get errors on the LBAs that point to broken, not re-vectorable bad blocks, but the total number of LBAs reported won't change. So your 500GB drive might not be able to store 500GB anymore, but it's still a 500GB drive.(Note that manufacturers don't lie about the number of available blocks on new drives because that would be fraud). MisterSpecific (talk) 22:25, 14 February 2016 (UTC)
Now I'm completely confused. I have no argument with any of the above. Nor do I see how anything I wrote was diagreeing, let alone "the inverse". Perhaps I was careless with terms.
What the outside world sees as the LBA limit gives a capacity very slightly larger than the "package label" capacity. We're all agreed on that, no?
And the actual capacity of the media is in turn larger than that implied by LBA limit x block size. Yes?
How the drive achieves this, how much "actual" or "raw" capacity is needed to implement the "external" capacity, is a private matter between the drive's firmware and the media. No?
And you want the article to talk about the internal/actual/raw capacity, in addition to the advertised ("package") and the (slightly larger) end-user-available capacity? I have no objection there either.
It seems to me that this is a detail of the internal implementation. Which is useful to describe here, but doesn't really affect the end user.
Edge cases: Consider a drive series that uses two platters. In one model they have four heads accessing all four surfaces. In another model, with 3/4 the advertised capacity and less than 3/4 the cost (because of course they charge you a premimum for the highest-capacity drive), they don't implement one of the surfaces. Do we claim that the "actual" or "raw" capacity of these two drives is the same? After all the media is the same; the only difference is in whether the firmware uses all of it. What about if the "smaller" drive doesn't have the head for the ignored surface? The media is still there...
I recall the "Wrenrunner" drives, which were the same hardware as the "Wren", but with about 5% of the innermost and outermost cylinders locked out in the firmware - "short stroking", achieving improved seek times at the cost of capacity. Would we say the "raw" capacities of these two drives are the same? (A friend of mine used to tell his clients: Just buy the bigger-and-cheaper one, and don't use all of it!)
Edge cases aside: exactly what new information do you think the article should contain? And/or, what in the existing article do you think is incorrect or misleading? Jeh (talk) 22:47, 14 February 2016 (UTC)
Regarding the usable capacity of a drive, it's virtually the same no matter if we look at the stamped/advertised capacity in terabytes, or reported capacity in LBA blocks. IMHO, we're all on the same page, and we should try to focus on actual proposals how the article should be improved. For example, we might want to discuss how to improve the description of user-inaccessible areas that exist inside drives. — Dsimic (talk | contribs) 01:30, 15 February 2016 (UTC)

We should probably start by rewriting the first sentence in the Capacity section:

"The capacity of a hard disk drive, as reported by an operating system to the end user, is smaller than the amount stated by a drive or system manufacturer; this can be caused by a combination of factors: the operating system using some space, different units used while calculating capacity, or data redundancy."

First, I know of no case where user-inaccessible areas that exist for redundancy (error correction codes are "redundancy") or for sparing are included in the advertised drive capacity. "Redundancy" is not mentioned in the section at all beyond this sentence.

Second, "different units" does not cause actual errors of results, only errors of interpretation.

Third, the capacity an OS reports may well be slightly larger than the advertised capacity, as we've already discussed. Of course it depends on where you look. If you look at the OS's report of maximum LBA you get one number which, as we've discussed, is slightly larger than the big round number on the drive box. If you look at the OS's report of capacity of a partition, even if that partition is as large as it can be for the drive, you get another, slightly smaller number (due to the MBR + the alignment requirement for the partition). If you look at the OS's report of the free space on a newly-created partition you get a number that is slightly smaller yet again, due to the space occupied by file system metadata.

Here's a suggested rewrite:

"The capacity of a hard disk drive, as reported by an operating system to the end user, will usually be different than the amount stated by a drive or system manufacturer. This can be caused by a combination of factors: The manufacturer "rounding down" from the actual capacity to the advertised capacity; overhead used by the partitioning scheme used; and overhead used by the operating system's file system. In addition, operating systems use slightly different meanings of "GB" or "TB" than the hard drive manufacturers. Finally, the actual media inside the drive always has an internal capacity that is significantly larger than that available to the operating system, the excess being used by the drive's firmware."

Followed of course by a subsection addressing each point. Jeh (talk) 09:15, 16 February 2016 (UTC)

Comments so far?

@MisterSpecific: @Dsimic: @Tom94022:

Jeh (talk) 09:15, 16 February 2016 (UTC)

Jeh, your proposal for the opening sentence looks good to me, and I'd just tweak and expand the wording a bit so it ends up like this:
The usable capacity of a hard disk drive, as reported by operating system to the end user, is usually different than the capacity stated by the drive or system manufacturer. This may be caused by a combination of factors: the drive manufacturer may be "rounding down" from the actual capacity to the advertised capacity, a certain amount of capacity overhead may be imposed by the partitioning scheme, or the overhead may be coming from the file system's on-disk metadata. Additionally, operating systems may use slightly different meanings of "GB" or "TB" than the hard disk drive manufacturers, which results from the difference in decimal and binary meanings of those units. Finally, the actual media inside the drive always has an internal raw capacity that is significantly larger than that available to the operating system, with the excess being used by the drive's firmware for different purposes, such as maintaining ECC data or providing space for the remapping of bad sectors.
Hope you all agree. — Dsimic (talk | contribs) 15:42, 16 February 2016 (UTC)
I mostly agree, except that there is a conflation between the way drives historically worked (drive vendors reporting different numbers based on their own ideas of what size the drive was) and they way they work now (using the IDEMA standard to report a standard drive size). This modern approach makes @Dsimic:'s statement in the expanded first line above misleading; the drive manufacturers claim that a drive is 500GB because it tells you the drive addresses 976-odd million blocks, regardless of what's happening internally. So I'd modify the first line thusly:
The usable capacity of a hard disk drive, as reported by operating system to the end user, is usually different than the capacity stated by the drive or system manufacturer. Modern drives report a standards-defined number[1] of Logical Block Addresses that the drive exports for a given capacity. However, this advertised capacity may be different than the OS reported capacity for a number of reasons: the drive manufacturer may be "rounding down" from the actual capacity to the advertised capacity, a certain amount of capacity overhead may be imposed by the partitioning scheme, or the overhead may be coming from the file system's on-disk metadata. Additionally, operating systems may use slightly different meanings of "GB" or "TB" than the hard disk drive manufacturers, which results from the difference in decimal and binary meanings of those units.
The actual media inside the drive typically has an internal raw capacity that is significantly larger than that available to the operating system, with the excess being used by the drive's firmware for different purposes, such as maintaining ECC data or providing space for the remapping of bad sectors.
MisterSpecific (talk) 18:09, 16 February 2016 (UTC)
Ah, now I get your point, MisterSpecific. As an example, long time ago 9 or 18 GB SCSI hard disk drives used to have slightly different numbers of sectors, which in many cases resulted in the need to use a 18 GB drive when replacing a failed 9 GB drive as part of a RAID set. All that becuase of, say, a difference of a few thousand sectors, and the fact that RAID controllers used 100% of 9 GB drives during the initial setup of a RAID set. That was plain stupid.
I'd tweak the wording a bit further, so it reads as follows:
The usable capacity of a hard disk drive, as reported by operating system to the end user, is usually different than the capacity stated by the drive or system manufacturer. The design of modern drives follows an IDEMA's industry standard for reporting the number of logical block address (LBA) blocks for a specific advertised capacity, so all drives with the same advertised capacity report the same same number of LBA blocks.[2] However, differences in advertised capacity and capacity reported by the operating system are caused by different reasons, including the following: the drive manufacturer may be "rounding down" from the actual capacity to the advertised capacity, a certain amount of capacity overhead may be imposed by the partitioning scheme, or the overhead may be coming from the file system's on-disk metadata. Additionally, operating systems may use slightly different meanings of "GB" or "TB" than the hard disk drive manufacturers, which results from the difference in decimal and binary meanings of those units. Finally, the actual media inside the drive typically has an internal raw capacity that is significantly larger than that available to the operating system, with the excess being used by the drive's firmware for different purposes, such as maintaining ECC data or providing space for the remapping of bad sectors.
This should resolve any confusion we had previously. :) — Dsimic (talk | contribs) 18:58, 16 February 2016 (UTC)
Seems like we should have some material on IDEMA 1223, perhaps a table of standard capacities, and a bit about the pre-standard situation. Jeh (talk) 19:14, 16 February 2016 (UTC)
Linux mdadm man page could be used as another reference, here's a quote from it:
Sometimes a replacement drive can be a little smaller than the original drives though this should be minimised by IDEMA standards. Such a replacement drive will be rejected by md. To guard against this it can be useful to set the initial size slightly smaller than the smaller device with the aim that it will still be larger than any replacement.
For a table of standard capacities, we could use page 12 in the SFF-8447 document as a reference, which also says this on page 9:
For interchangeability of storage devices in some systems, the advertised capacity of the storage device needs to be associated with the same LBA count for products from different suppliers. This specification defines the standard method for determining the LBA count for disk drives. This specification defines the functions to calculate the LBA count using the advertised capacity, logical block size, and inclusion of SCSI Protection Information.
Of course, we could use even more additional references. — Dsimic (talk | contribs) 20:41, 16 February 2016 (UTC)
Does anyone know if the IDEMA spec is literally followed by each of the three remaining manufacturers in all of their products? It maybe that they intepret the calculated number of blocks for a given advertised capacity as a minimum number and not a precise number. Be that as it may, I think the proposed changes may have gotten so far into the weeds so as to me unitelligible to the average reader. If we prioriitized the causes of discrepancies between system reported capacity to the EU versus manufacturers advertised capacity to the EU, wouldn't it be something like this in pareto order
  1. Some OSes use different numbering systems (the binary prefix issue)
  2. Some system vendors use hidden partitions
  3. File system and operating system overhead
  4. Rounding residuals differing between models that are left over in rounding down to the advertised capcity from the number of reported blocks of data, now maybe solved by an IDEMA standard.
Actually I sort of like the IDEMA definition of actual capacity = number of blocks times bytes per block, which is now a 9 or 10 digit number. How this is tranformed into an EU number is different for OSs and System/OEM mfgrs and that's why the numbers rarely agree. For the readers sake, I'd like to see the definition restructured beginning with all expressions start from the same value and then explain in rank order how the descrepancies have arisen. If I have some time maybe I will take a hack at it. Tom94022 (talk) 22:42, 16 February 2016 (UTC)
@Dsimic:, regarding "Do all vendors use the IDEMA numbers?", I can speak only to Western Digital; they did while I was there (2011-2014). I can say that I spent some time looking at drives from other vendors which seemed to follow this standard as well. I know that the (proprietary) specs for drives to be OEMed by major hardware vendors all required the IDEMA LBA-count spec be followed (using the exact IDEMA number). I will check online with the various vendors to see if they say one way or the other that they follow the specs.
@Tom94022:, I like the refactoring for the "Why the hell doesn't my OS say the same capacity number as my drive manufacturer?" reasons. I can't tell you the number of times I've seen this particular rant on places like Amazon and Newegg for people that have bought drives (it's a personal bugaboo, I must say). The first reason you list is the big one - the GiB vs. GB translation factor. It might be good to give a concrete example for this one - say, I bought a 128GB drive and my OS says it's only 119GB, how come? 128GB as specified by the drive vendor is 128*1000*1000*1000 bytes, where the OS calls out the capacity in GiB (although it calls them GB), which are 128*1024*1024*1024. 128*10**9 divided by 1GiB (1073741824 bytes) is 119.2GiB, wrongly labelled as 119GB.MisterSpecific (talk) 00:27, 17 February 2016 (UTC)
After a little groveling about online, solid evidence that manufacturers comply with particular specs is probably going to have to come directly from the vendor. The specification sheets for the drives I've looked at don't even claim to meet the T-10 or T-11 spec sets (SCSI and SATA), so meeting this spec is probably even more difficult to unearth.MisterSpecific (talk) 00:27, 17 February 2016 (UTC)
Agree completely with the above, esp the importance of addressing the GiB vs GB issue. A huge number of the complainers seem to think that just because RAM comes in sizes easily written as "a power-of-two integer multiplied by a power of 1024" that everything else in a computer must also, and that the drive makers are "cheating" by not complying. And no matter how well you document that there is nothing in a HD that requires, or even mildly encourages, a factor-of-1024-related size, most of them persist in this belief. Jeh (talk) 00:36, 17 February 2016 (UTC)
The binary vs. decimal prefixes issue is already very well covered in the article, so IMHO we shouldn't be focusing on it in this discussion. Regarding how well the remaining drive manufacturers comply to the IDEMA numbers, well, by checking out some online drive specifications it seems that WD complies, while Seagate for some odd reason seems not to provide LBA counts in its drive specifications. — Dsimic (talk | contribs) 03:00, 17 February 2016 (UTC)

References

When were hard disks first called hard disks?

I believe that until microcomputers started to become common in the late 1970s and early 1980s, hard disks were called "disks", not "hard disks". I believe the terms "hard disk" and "hard disk drive" are retronyms to disambiguate from the newly-ubiquitous floppy disk and floppy disk drive. If this is correct, it is misleading for the article to say, as it does now, "Hard disk drives were introduced in 1956 as data storage for an IBM ..." as I strongly doubt that the term "hard disk drive" was used until at least a decade later. —Anomalocaris (talk) 07:48, 22 February 2016 (UTC)

Your recollection matches mine, and furthermore the mainframe and minicomputer industry didn't start calling them "hard drives" until long after the personal computer industry had. In that universe they were simply "disk drives" (vs. "floppy drives"); since hard drives came first there was no confusion among the buyers. However the sentence you're objecting to can be fixed with four words at the beginning: "What are now called ..." It's not as if it's a major error in the article. Jeh (talk) 08:21, 22 February 2016 (UTC)
Jeh: I agree that it's not a major error in the article. However, I think the article should say when the terms "Hard disk", "Hard Disk Drive", and "HDD" began to be used, and what these things were called before that, and avoid the impression that the term "hard disk drive" was in use in 1956. —Anomalocaris (talk) 08:56, 22 February 2016 (UTC)
It will likely be difficult to find a ref for exactly when the HDD name became commonly used throughout the industry, but it shouldn't be too hard to document "disk drive" and similar from advertisements, archived manuals, and the like. Terminology evolution is an interesting part of many WP articles, esp (in my perception, anyway) tech-oriented ones. Jeh (talk) 09:05, 22 February 2016 (UTC)
Don't confuse the thing with its name. It's true that the term hard disk drive was coined in order to distinguish hard disk drives from floppy disk drives, but that term is applicable to devices that precede it, e.g., IBM 350. That said, I concur with the suggestion to explicitly state that the term is new. Shmuel (Seymour J.) Metz Username:Chatul (talk) 18:05, 22 February 2016 (UTC)