Talk:Hard disk drive/Archive 14

Latest comment: 10 years ago by 71.128.35.13 in topic Stability of slope
Archive 10Archive 12Archive 13Archive 14Archive 15Archive 16Archive 20

Highlights In History Section

There are five HDD parameters whose improvements over the years are highlighted at the bottom of the History Section of this article. Most go down over time and one goes up so a ratio is used as the one consistent measurement of the improvement over time. An IP recently added to only one of the parameters two additional measurements, the CAGR and the inverse CAGR, the latter being megabytes/$, a somewhat uncommon expression. They have been moved to a footnote which I hope will not reappear in the section.

It is not foolish to expect that consistency across these five parameters will be useful to the reader. So I first think even the footnote should go. If we want to have a second and totally redundant measurement then I suppose CAGR could be added to all five but what information does it add. I do object strongly to the inverse, $/megabyte, either as a third measurement on Price/MB or as a sixth parameter with its measurement. As a third measurement, who needs three and I suspect a reader will find a decreasing parameter having in increasing growth rate some what confusing. Since it is the inverse of the more common Price/MB it would be totally redundant as a sixth parameter and furthermore has the disadvantage of not being very global.

It is pretty clear the IP added it because he wants to show it approximates a Moore's Law growth rate. This is true but this is not the place for it. To the IP: IMO this is better placed in the Moore Law article, or the Storage Density Article (which needs a lot of work). I suppose it could even go into a new section in this article, but that would have to be a far more expansive coverage than just the two points in this summary part of the history section.

In summary, my recommendation is one consistent measurement, ratio, for the five parameters and kill the footnote. Tom94022 (talk) 07:35, 11 August 2014 (UTC)

Thank you for commenting on this! How about turning the bulleted list into a table, with "Old", "New" and "Improvement" columns? That would be more compact and might be more clear to anyone who's looking for the trends in HDD industry. — Dsimic (talk | contribs) 07:46, 11 August 2014 (UTC)
A table would work. The "old" parameter values are RAMAC, circa 1956. The dates associated with the new parameter values are current but not necessarily consistent, so maybe 1956 and Current for the column heads. I believe all the Current parameter values are referenced for anyone needing more detail. Tom94022 (talk) 08:01, 11 August 2014 (UTC)
Ok, went ahead and implemented the table. Looking good? — Dsimic (talk | contribs) 08:26, 11 August 2014 (UTC)
Well done, thanks and good night (I'm in California) Tom94022 (talk) 09:01, 11 August 2014 (UTC)
Thanks. :) Hopefully other editors will also find this compaction to be an improvement to the article. — Dsimic (talk | contribs) 09:06, 11 August 2014 (UTC)

This compaction is an improvement, however as Dsimic put it: I think having more info in form of a note can't hurt. I restored the footnote to highlight the congruence between price and areal density.

Speaking of areal density, why isn't it on this summary chart? It's mentioned 10 times elsewhere in the article, as frequently as cost and price. To the Tom09422 editor: IMO areal density deserves a place in this table because areal density and price per byte are congruent, while Moore's law is tangential. By the way, that Memory_storage_density Article does need a lot of work, as you will see in my comment on its talk page in response to your request for citation. I did manage to find an historical HDD price reference to support the claimed HDD price over there.

To correct the areal density oversight here in this HDD article summary table, I've added areal density. Hopefully, other editors will find areal density to be as relevant to HDD performance characterization as volume (ft^3), mass (lbs) and access time (ms).71.128.35.13 (talk) 20:37, 11 August 2014 (UTC)

Including areal density into the comparison table is fine with me. However, I'd leave comments regarding the footnote to Jeh and Tom94022 as I'm no longer either supporting or opposing it... From now on, when it comes to that footnote I'm a flat line. :) — Dsimic (talk | contribs) 21:14, 11 August 2014 (UTC)
I am inclined to remove the footnote. One, there is a point where more information is just clutter. Two, expressing something that has undergone periodic great change interleaved with periods of relative stagnation as an "annualized percentage change" is misleading. At least the IP deigned to show up here while simultaneously edit-warring his change back into the article. Jeh (talk) 01:11, 12 August 2014 (UTC)
I agree the footnote should be removed and am doing so.
Speaking of Areal Density, it like many other highly technical parameters was not in the table because it is not meaningful or visible to the average consumer as are the other parameters. So I would like to remove it but I can live with it if other editors agree, if nothing else I will put it at the bottom of the table. Tom94022 (talk) 06:20, 12 August 2014 (UTC)
First, Jeh, the claim that storage price has undergone periodic great change interleaved with periods of relative stagnation surely sounds reasonable, but the data don't support it. Retail prices are shown on IBM Almaden Research slide 5 http://stratos.seas.harvard.edu/files/stratos/files/db2_blu_academia.pdf and also at http://www.jcmit.com/disk2014.htm Three decades show buttery-smooth storage price progress that is in fact a straight line, well characterized by an APR. Therefore, APR is useful and its stability far exceeds expectations.
Second, branding APR as a "misleading" indicator of unstable trends stems from an unrealistic expectation that is contrary to the widespread use of APR in technology sector and in economic indicators. GDP growth over two centuries has been routinely expressed in terms of annual rate. We know perfectly well that GDP growth wasn't constant through recessions, wars, financial panics, and depressions. APR, however imperfect, is a useful real-world descriptor of labor productivity per hour, the technical progress of microprocessors (Moore's law) and areal density. It's just the average slope of a not-perfectly-straight line, no more or less misleading than other summary descriptors.
This disputed deletion was hustled through in the literal dead of night, at 08:26 11 August 2014 (UTC) which is 1:26 am in California where one editor claims to be located and 4:26 am on the east coast. It's always faster for the experienced editor to act at night, without allowing the newbies to comment. I'd be interested in your reply, should you deign to do so. 71.128.35.13 (talk) 20:24, 12 August 2014 (UTC)
It is a bit disingenuous to call oneself a newbie when since Dec 2012 u have made 165 edits amounting to about 25,000 words. U have asked that I assume WP:good faith on your part - "hustled through in the literal dead of night" doesn't read like you are willing to reciprocate. I edit when I have the time, sometimes late at night. Be that as it may, I think it is fair to say that HDD pricing ($/MB and $/box) had punctuated stability into the late 1980s at which point both declined at varying rates, until maybe the recent times (your "elephant"). APR may make for pretty illustrations but it can be misleading when there are no underlying linear processes. We don't know if the Almaden data are reliable since the sources are not given and given the kinks and hickups in the data points it is likely that any AGR covering a long period of time could be misleading. Tom94022 (talk) 00:32, 13 August 2014 (UTC)

Tom94022, you misrepresent me and you mislead us all. First, your mislabeled "clean up" edit summary misrepresented new data that you added. Then today you only admit to an error in rounding off significant digits, a math mistake. But you do not admit to the deceptive, by omission, edit summary mislabeling.

Second, your detective work is actually an exposé on my IP address that should strike fear into the heart of any privacy-respecting editor. It sounds like you have dug deep to uncover a believable, and likely accurate, summary of IP activity from what sounds like my very own address. In fact, I don't know anything of what happened on this IP address before the spring of 2014, and I know only a portion of what happened since.

My lack of knowledge is understood easily, because you have confused me with my IP address. I am in fact a newbie, contrary to your inaccurate accusation of several years and hundreds of posts. My first post was in the spring of 2014. Because I respect both other editors' privacy and my own, I use the same IP address as a multitude of other individuals none of whom I know personally. I have no idea what they do or have done on wikipedia. I take responsibility for my own edits, and I do not deliberately misrepresent edit summaries or leave them blank out of laziness. I've made a lot of edits and contributed/written thousands of factual, unbiased, technically sound, well-referenced, high-quality words since the spring of 2014 from this IP address, alongside those other individuals whom I mentioned earlier.

It takes a long time to learn the ropes, and appreciate how business on wikipedia is really done by experienced and forensic editors like you, sir or madam. I am thankful that I've taken basic measures to protect my privacy from the threat of inquisition that is now revealed for all to see and is posed against any editor in disagreement.

Third, I think it is fair to say that HDD prices declined at a steady rate for three decades. Here's the reference that supports this claim, slide 5: http://stratos.seas.harvard.edu/files/stratos/files/db2_blu_academia.pdf The technical name of the underlying process described by the APR parameter is exponential growth. The effects of exponential growth are seen everywhere in the HDD industry and particularly in the first table of this WP article under the title “Improvement of HDD characteristics over time,” as well as in the Whyte graphic and in Moore's law. It is disingenuous or perhaps just mathematically dysfunctional to trumpet exponential areal density improvement, while denying the validity of annual percentage change (APR) or the historical existence of an underlying linear process of exponential growth. If these price data are unreliable and their provenance is wholly uncertain, as you would have us believe, you might share your concern with IBM Almaden Resarch in San Jose, California (and perhaps provide more authoritative data sources to them and to wikipedia as well). Guy Lohman is the name of the Research Manager who cites this price data as recently as the spring of 2014. I think these retail price data deserve their proper due, much more so than an empty cautionary warning unsupported by any substantive reference or price data.

Fourth, if you dig into the data forensically and deeply, instead of into other editors' IP address histories, you will find that the underlying price data (model numbers, dates, retail stores and advertisers) are located on the website of a Canadian former professor of information sciences. The price data are not at Almaden Research. This is contrary to your totally wrong and deceptive assertion that “We don't know if the Almaden data are reliable since the sources are not given.” The source of the data and a detailed accounting of the each of the prices are indeed given. You may recall that I showed this reference to you a month or two ago, here on this very talk page.

Fifth, drawing upon valid price data would improve the quality of this wikipedia article. This particular article wholly lacks long-term and comprehensive magnetic storage price data, and the editors actions in the last two days have even removed the one footnote detailing one example of the rate of storage price APR improvement over the span of five decades. The footnote was labeled as “clutter.” The above jcmit/IBM price reference, which you poo-poo and decry so as to sweep it under the rug while IBM Almaden Research cites it, can fill the chasm of emptiness in this article. Areal density is not the whole story, as this article misleads its readers into believing. Price improvement, a vital concern to users, has gone hand in hand with areal density growth. This article's price blindness is not the result of happenstance; the blindness is engineered deliberately by the preconceptions and agendas of those who wrote this article. As for me, I have no affiliation with the magnetic storage industry.

I support restoring the storage price footnote to this article, improving the long-standing (going back many years according to historical records) editor disharmony by writing better quality unbiased edits, stopping the deceptive edit summaries and tendentious and largely irrelevant to article quality talk-page discussions, halting false claims that a source of un-liked but accurate data is unreliable, and respecting the privacy of WP contributors and their IP address histories. 71.128.35.13 (talk) 03:17, 13 August 2014 (UTC)

Just as a note, there's nothing wrong with (re)viewing other editors' contributions, and that requires no voodoo. Everything that's submitted to Wikipedia becomes public, as does the history of edits; please see Help:User contributions for more information. — Dsimic (talk | contribs) 03:58, 13 August 2014 (UTC)
OK, copy that. For the performance improvement table, would it not be more accurate to use characteristics for just one of the $0.05/Gbyte 3.5 inch consumer drives across the board (weight, volume, etc.) rather than mixing and matching the best characteristics from various different products that could never be combined into one real product (low weight, small volume, lowest price, highest areal density, etc.)? The baseline 1956 product characteristics were not mixed and matched: they were all from just one real product.
No editor response as of yet to my request to restore the storage price improvement rate (APR) footnote to this article? 71.128.35.13 (talk) 21:17, 13 August 2014 (UTC)
Hm, regarding what to use for comparisons in the overview table... It all depends on one's point of view, and to me all technology advancements in the HDD industry are equally important. Thus, it might be more suitable to "cherry pick" highlights from multiple current products, as that shows better where the HDDs currently are. For example, there are 1.8-inch HDDs that aren't the cheapest, largest or fastest, but they're lighweight and have small volume; then, there's also a variety of large, relatively cheap and quite fast 3.5-inch drives, etc. These examples show two pretty much disparate product categories, but they both "show off" areas of the progress achieved so far. At the same time, IBM 350 disk storage unit (a component of the IBM 305 RAMAC computer system) is the first HDD, thus there isn't much to "cherry pick" from on that side of the comparisons. That's just my opinion, of course.
By the way, we might consider microdrives to be used in the overview table, as they have much smaller volume and weight than 1.8-inch HDDs. — Dsimic (talk | contribs) 08:49, 14 August 2014 (UTC)
FWIW, I think "cherry picking" from among the current drives is most appropriate for this section where we are trying to give the reader an overview of the progress from RAMAC to date. Thus AD is likely a 2.5-inch, Price/Megabyte is likely a 3.5-inch desktop, etc. Tom94022 (talk) 00:35, 15 August 2014 (UTC)
With regard to price, there are all sorts of things wrong with the IP's cite, I just don't have time to respond in detail, but I would note that in this section end points are sufficient to give the reader an overview, so that trend line slope even if well established would be overkill as indeed is two point CAGR. Tom94022 (talk) 00:35, 15 August 2014 (UTC)
This nitpicking brings to mind Voltaire: the perfect is the enemy of good. This reference is better than leaving the price vacuum. Editors should at least be concrete when disputing citations, not pontificate darkly: “there are all sorts of things wrong with the IP's cite, I just don't have time to respond in detail”. IBM Almaden Research also uses this cite: it's IBM's cite too. I think the retail prices deserve their proper due, much more so than unsupported warnings (FUD).
Yes, this table depends on one's point of view. Looking at rounding precision leaves important questions hanging. Whether technical benchmarks ought to be considered in isolation or in an integrated HDD hinges on a fundamental issue: what's the subject of this article? Does it examine the progress of HDDs (always a happy story), the HDD as an integrated system, or the HDD as an assemblage of parts? If it's about the system, then à la carte benchmarks are bait, and real users inevitably switch to an integrated system. Marketers respond to today's reality by saying “we've come a long way,” “we're making good progress” in reference to the rate of improvement or APR, and promise great things on the “roadmap.” This HDD article has a bit of each.
Flash memory rendered microdrives obsolete, though used examples might still be found on ebay. Any narrow focus on technical capability will eventually be blindsided by lack of demand. The tide of technical change has turned, and certain benchmarks have already receded from their high water marks. While progress (areal density, perhaps) may march forward, those roads on the map aren't yet built, and no particular technology is assured of advancement or even of survival. Extinction is the natural result of creative destruction schöpferische Zerstörung, an idea that carries vestiges of Marxism. Were it not for a lack of demand impeding growth, we could still build faster Concordes, better horse buggies, and smaller microdrives.
A Model T automobile in one column shouldn't be compared with the price of a Yugo, the speed of a Ferrari, and the efficiency of a solar-powered car in the adjacent column. A WP article on the subject of automobiles ought to compare the Model T with a Toyota Corolla or VW Golf. Henry Ford and IBM RAMAC would surely fare better in this comparison had they optimized separately for each benchmark.
This table should give a 30,000 foot view. To keep things simple, it should list a consumer product because enterprise HDDs are a small part of the market and transaction prices are not transparent. High water marks belong in the “History of HDDs” article, and/or farther down in the technical subsections of this HDD article, and/or broken out as separate headings like “1956 system,” “Current HDD,” “Improvement” ratios for a current system, and “Bests” or “Historic bests” for the smallest microdrive, cheapest (per byte) HDD, and biggest capacity HDD. 71.128.35.13 (talk) 19:05, 15 August 2014 (UTC)
Well, at least horse buggies are still around, right? — Dsimic (talk | contribs) 02:48, 16 August 2014 (UTC)

Around yes: but not so important to GDP, while HDDs retain importantance. 71.128.35.13 (talk) 21:46, 16 August 2014 (UTC)

To IP 71.128.35.13

I am posting this here because the IP apparently does not access his talk page. A self professed newbie, fairly sophisticated in Wiki usage, prolific, argumentative and using a shared IP address is a suspicious editor. From, Wikipedia:Sock puppetry, "Wikipedia editors are generally expected to edit using only one (preferably registered) account. Using a single account maintains editing continuity, improves accountability, and increases community trust, which helps to build long-term stability for the encyclopedia." (emphasis added)

May I suggest u at least register and perhaps over time the suspicion will lessen. You could also try to shorten you responses so as to not filibuster a topic. Tom94022 (talk) 22:50, 15 August 2014 (UTC)

Yup, I edit from 5.13&5.14(once) and I've no conflict of interest. BTW, this is the page for discussing improvements to the article, not for ad hominem puppet stuff. 71.128.35.13 (talk) 22:34, 16 August 2014 (UTC)
If you really want to "take responsibility for all of your edits" you should create an account. Jeh (talk) 23:36, 16 August 2014 (UTC)

Why Whyte Cannot Be Used To Precisely Calculate a Trendline

The following shows the errors in the first seven data points introduced by the IP when he tries to reverse engineer data out of a jpg graph.

IP Date Nearest Product Actual Year IP AD Actual AD AD Error
1956.8 IBM 350 1956.7 2.06E-06 2.00E-06 -3%
1962.6 IBM 1311 1963 5.12E-05 5.10E-05 0%
1964.5 IBM 2311 1965.3 9.86E-05 1.02E-04 3%
1965.6 IBM 2314 1966 2.16E-04 2.20E-04 2%
1970.6 IBM 3330 1971 8.04E-04 7.80E-04 -3%
1973.6 IBM 3340 1973 1.55E-03 1.69E-03 8%
1975.5 IBM 3350 1976 3.19E-03 3.07E-03 -4%

The source for the actual data is the IBM 1981 JRD article, "a Quarter Century of Disk File Innovation" except for the 2311 which is not in the article but its date is the FCS of the first System/360 and its AD is exactly 2 x the 1311. The AD errors are not systematic and can be as large as 8%. The JRD article only gave year of shipment and 5 out of the 7 data points have the wrong year; for the two products with known FCS months, one is off by almost a year. There other errors or omissions in Whyte such as not correctly identifying the 2006 state-of-the-art, leaving out the double density 350 and I suspect others.

Don't get me wrong, I think Whyte did a great job and is a reliable source as it stands in graphic form, but it is not usable for establishing a trend line to one decimal digit for purposed of comparing to 41.4%.

We don't know whether Whyte was in error or the errors were introduced by the IP's process or some combination of both, but this sample of 7 of the 42 data points, along with other errors or omissions demonstrates why calculating a trend line to one decimal digit in this manner is a case of garbage in and garbage out. Tom94022 (talk) 06:48, 19 August 2014 (UTC)

Stability of slope

Slope calculations are moot, because WP:NOR disallows editor analysis. Updating seven points (-3%, 0%, 3%, 2%, -3%, 8%, and -4%) would not change the slope significantly. With 7 updated and 35 more recent data points, slope would increase from 40.9% to 41.2% per year, still similar to Moore's law. 71.128.35.13 (talk) 22:42, 19 August 2014 (UTC)

Since the policy of no original research does not apply to talk pages updating 7 out of 42 points is perfectly permissible. Given six out of the first seven points u use are incorrect as to AD or date or both, there can be no doubt that most if not all of the remaining 35 points are in error. This means that any slope calculated to one decimal digit from this inaccurate data is meaningless (garbage in garbage out). You could fix all the points, including adding the missing ones (e.g. IBM 350-3, Toshiba, etc.) but that would be original research and if you did it I suspect you would discover a trend line greater than 41.4. But the trend line is still meaningless when we are talking about actual data points which may be above or below a trend line, meaning the actual growth at that point is above or below the trend. Tom94022 (talk) 18:48, 24 August 2014 (UTC)
There's no need to speculate endlessly. Just update the remaining 35 data points and recalculate the 42-point slope. The slope, regardless of data revisions, has been shown to be stable. The slope was 40.9% per year or about 41% and after updating 7 of the 42 points the slope is still 41.2% or about 41%.
The slope calculation is moot, because WP:NOR relies on authoritative sources not calculations by editors. Reliable sources (Plumer, Tuomi, Marchon) indicate the density improvement rate was similar to, a little more or a little less than, Moore's law over five decades. 71.128.35.13 (talk) 21:19, 24 August 2014 (UTC)
Do we agree that your calculation of slope from Whyte is moot? Tom94022 (talk) 00:43, 25 August 2014 (UTC)
No, that smells Faustian. The calculation based on all 42 points, is much more accurate and reliable than your two-point calculation of slope. The straw man here is the issue of measuring point by point, which just diverts attention from the actual stability of (42 point) slope. WP:NOR states that editor calculations must give way to reliable sources. No reliable reference has said that Moore's law lagged areal density improvement over five decades. In fact, Plumer, Tuomi and Marchon indicate that density growth was similar to Moore's law. Albeit moot, the slope of all valid data points is stable (41% per year with 7 updated points and 35 original points) and entails less editor manipulation and obfuscation than the moot and unreliable slope of just two points. 71.128.35.13 (talk) 00:03, 26 August 2014 (UTC)
I agree that "The slope calculation is moot, because WP:NOR relies on authoritative sources not calculations by editors." They are your 42 points, have been shown to be wrong and it is your calculation. What more is there to be said? Tom94022 (talk) 17:56, 26 August 2014 (UTC)
No reliable reference supports the wholly unfounded claim that Moore's law lagged areal density improvement over five decades. 71.128.35.13 (talk) 01:15, 27 August 2014 (UTC)