Talk:Hard disk drive/Archive 15

Latest comment: 10 years ago by 71.128.35.13 in topic HDD Areal Density vs Moore's Law
Archive 10Archive 13Archive 14Archive 15Archive 16Archive 17Archive 20

HDD Areal Density vs Moore's Law

Split Talk Section "2016 desktop capacity revised forecast" here See original comments in 2016 desktop capacity revised forecast by 71.128.35.13 (talk) 23:20, 23 May 2014 (UTC)

Regarding comparison to Moore's Law (ML), the long term, that is, from the invocation of Moore's law circa 1965 shows magnetic areal density (AD) growing at a slightly higher rate than ML and during the 90s and into the 00s at a much higher rate. Both AD and ML appear to be slowing down in this decade due to fundamental physical limits so the original statement is accurate while the reverted change is perhaps misleading by equating ML and AD into this century. Therefore I reverted the edit. It would be possible to fix the edit by removing the reference to ML but since most people know of Moore's law I the comparison is useful and should remain as is without getting into in which decade AD did or did not exceed ML. Tom94022 (talk) 17:56, 24 May 2014 (UTC)

The Moore's law reference does not document HDD AD long term; however the Coughlin reference does document 1990-2010 specifically. Could you provide a reference for longer term HDD? (It's certainly in the 25-100% ballpark for 1962-2014, I would imagine). The point is that the recent HDD AD trend has slowed, as Coughlin(2012) indicated. 71.128.35.13 (talk) 18:16, 24 May 2014 (UTC)

You really shouldn't revert without a basis other than u just don't like it, but since u asked. a Google search of "HDD Areal Density Trend" images turns up a number of reliable sources, one of which I picked. You should also note that the Moore's Law page states ML is also slowing down to about 30%/year, again not too far off from HDD AD. Tom94022 (talk) 00:13, 25 May 2014 (UTC)
This is in response to the entry you titled expansively,
 "Fair and accurate comparison to Moore's law wikipedia entry". 
First, I'll discuss the accuracy deficit in detail (and in a sense "authoritatively"), then the fairness issue, and finally propose an improvement to the article.

Thanks for pointing to Whyte (2009) areal density data, and to the Moore's law wikipedia entry. I had already come across that wiki entry, and this did not happen by chance.

I wrote that entry. You may be surprised to learn, that although I'm just a new Wikipedia user with no tie to the storage industry, I recently authored that particular new section. I'm glad to see that my work has found an audience. Wikipedia articles can have unexpected consequences.

As I wrote, MPU prices improved about 30% per year (halving every two years) before and after the thoroughly-researched late-1990s surge of technical advancement. Therefore, I can certainly explain for “u” Sir/Madam what that “30%” means, and how you appear to have misunderstood it.

You should be careful to distinguish between growth (a performance increase, with a certain doubling time) as opposed to decline (price improvement). The article says MPU prices halved every two years. The -30% annual decline rate is equivalent to a performance increase of

 (1 / (100% – 30% / year) - 1) = +43% CAGR. 

The halving time offers a way to double-check the calculation,

 exp( ln(0.5) / 2 years) – 100% = -29% per year; 

so halving time would really be closer to 1.94 years, or

 exp( ln(0.5) / 1.94 years) – 100% = -30% per year.
So, you “really should” use the +43% MPU performance CAGR increase, not the -30% / year price decline, to compare against areal density growth. The disk areal density slopes +40% per year in the reference you kindly provided (Whyte, 2009 of IBM).

https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/resource/BLOGS_UPLOADED_IMAGES/areal_2.jpg).

Whyte's silicon density slope of +34% CAGR almost parallels the aforementioned disk areal density slope. As Whyte(2009) explains explicitly these two have tracked together, “Another interesting side note can be seen when you add the areal density of silicon, which to this day has tracked almost scarily to Moores Law.”

Given the similarity between these trend slopes, judging which slope is “somewhat higher” would require one to perform a statistical test against some confidence value (p-value). You haven't provided any statistical reference that would support a conclusion as to which is higher. I suggest you return to the original “not substantially/substantively different” phrasing, instead of minting the new and unsupported claim of “somewhat higher.”

The slowing of Moore's law that you pointed to is neither here nor there. The 30% figure is way, way off base because it confuses performance growth with price decline as I have shown, and because it refers to MPUs some time ago. It does not refer to SSDs recently.

What really matters for HDDs is the recent SSD (relative) price trend. These data are available. From 2010 to 2013 SSD prices trended at the usual Moore's law rate and halved every two years, a capacity increase of around 40-45% per year. For reference, the following sources give recent SSD price trends: http://techreport.com/review/23149/ssd-prices-in-steady-substantial-decline http://www.storagenewsletter.com/rubriques/market-reportsresearch/when-will-ssd-have-same-price-as-hdd-priceg2/

In conclusion, during the 2010-2013 time period, the gap in terms of price per unit storage of information between SSD (40-45% growth per year of density) and HDD (10-25% growth) has narrowed by a factor of two or three. This is the headwind HDDs face.

I admonish you to look at the rules at the top of this page. “Be polite, and welcoming to new users” (like me!). “Assume good faith.” Accusing me of reverting “without a basis other than u just don't like it” does not assume good faith. “Avoid personal attacks.” Now as for me here, I'm just writing this in self defense. I see now that veteran industry marketing professionals employ a bare-knuckles, frank and direct style to get a commercial message out on heavily-trafficked and respected internet sites, to suppress inconvenient information and to stand guard over the community's content zealously as if it were owned privately. Even so, I would counsel you to renounce the commercialism that, at least from my own point of view, has been allowed to pervade this content of this HDD article.

I direct your attention to an example of commercial bias found in this article's “future development” section. Current version: “New magnetic storage technologies are being developed to support higher areal density growth, address the superparamagnetic limit, and maintain the competitiveness of HDDs with potentially competitive products such as flash memory-based solid-state drives (SSDs).” Obviously SSDs do compete, and are mischaracterized here as merely "potentially competitive." A less slanted approach would indicate that HDDs and flash memory are often complementary rather than exclusive. Proposed revision: “HDDs store most of the information in the world, and this is expected to continue because of their low cost and long retention times. HDDs face competition for some information storage applications from flash memory. Hierarchical storage management combines several storage types to improve overall cost and performance: faster but costly technologies such as SSDs and/or DRAM are joined with slower but less costly HDDs or tape. For example, the Fusion Drive is one of many commercial products that combine a small SSD and large HDD. New magnetic storage technologies are being developed to support higher areal density growth and address the superparamagnetic limit, as follows:”

71.128.35.13 (talk) 23:23, 25 May 2014 (UTC)

Sorry for assuming your lack of justification of a revert was because u "just didn't like it". I should have just said you shouldn't revert without explanation and have so amended my edit.
May I suggest u intent your responses so that a flow can be followed (perhaps as a newbie u were not aware of this practice).
May I also suggest u also “Assume good faith” and "Avoid personal attacks” - yr "self defense" is both and for the most part wrong.
Moore's Law relates to transistor density (doubles every two years) and is apposite to magnetic areal density. The section of the Moore's Law article I referenced predates your contribution and clearly relates to density improvements and neither price decline or performance improvements as u assert in your statement above. If you examine carefully the Whyte (2009) areal density graph u will see that HDD AD has a somewhat higher slope than the therein depicted 40% Moore's law slope (presumably 43%) - about 1 order of magnitude better over 54 years, albeit tracking "almost scarily.". So I think the current statement accurately and fairly describes Whyte. If u dispute this I will be happy to Photoshop the graph to prove the point. On the other hand I can live with "not substantively different" particularly if one uses 43% per year instead of 40% per year.
Competition is already covered in the lede so all that is necessary removing the word "potentially" which I have done. Tom94022 (talk)
Made some corrections above after more carefully examining Whyte. Tom94022 (talk) 17:06, 27 May 2014 (UTC)
Are you comparing Moore's law against Whyte(2009) disk areal density 1956-2010 CAGR? For "precision," Moore's law should be defined as “doubles every two years,” as you have indicated. Not 43%. (sqrt(2) – 1) = +41% CAGR = "doubles every two years" I see that I erred in claiming mainstream adoption has been reached for CPP_GMR.
71.128.35.13 (talk) 20:51, 27 May 2014 (UTC)
Actually Whyte compares Moore's law to HDD AD growth in his second graph, which BTW, I have asked for his permission to reproduce in Wikipedia. I agree that ML is a CAGR of 41% and have so changed the article (not sure how I got 43%). FWIW, if, for example, u connect the [[History_of_IBM_magnetic_disk_drives#IBM.27s_first_HDD_versus_its_last_HDDs|end points of IBMs magnetic disk drive products] you will get a CAGR of 47% (if I did the math right), somewhat more than Moore's Law and note that in 2002 IBM did not have the highest AD. Tom94022 (talk) 22:15, 27 May 2014 (UTC)
You directed me to Whyte(2009) as follows: “If you examine carefully the Whyte (2009) areal density graph..." The slope of Whyte(2009) is 42% CAGR calculated as follows:
Starting date for all comparisons = 1956 Interval (years) = 0 Density = 0.002
Date2010 Interval (years) = 54 Density = 400000 CAGR = 42% = ((exp(ln(400000 / 0.002) / 54)) - 1)

IBM (2002) offers a variety of numbers from which to cherrypick.
Date2002 Interval (years) = 46 Density = 26263 CAGR = 43% = ((exp(ln(26263 / 0.002) / 46)) - 1)
Date2002 Interval (years) = 46 Density = 46300 CAGR = 45% = ((exp(ln(46300 / 0.002) / 46)) - 1)
Date2002 Interval (years) = 46 Density = 70000 CAGR = 46% = ((exp(ln(70000 / 0.002) / 46)) - 1)

If you Photoshop the Whyte(2009) chart you will confirm 42% CAGR. Going back to IBM(2002) still doesn't move the CAGR very far from Whyte(2009). As originally defined over 1956-2010 (without moving the goalposts and truncating to 1956-2002), a fair and accurate comparison is as follows: Whyte(2009) areal density 42% CAGR is not substantively different from Moore's law 41% (doubling every two years).
IBM(2002) is found at History_of_IBM_magnetic_disk_drives#IBM.27s_first_HDD_versus_its_last_HDDs
Whyte(2009) is found at https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/resource/BLOGS_UPLOADED_IMAGES/areal_2.jpg 71.128.35.13 (talk) 20:19, 28 May 2014 (UTC)
My eyeball estimate from Whyte was that after 54 years HDD AD was about one order of magnitude higher than a ML growth. Yr estimate of 42% turns out to be 46% higher than ML growth - 1% better over 54 years does matter. Also I expect u estimated the AD in 1954 and 2010; in fact the maximum AD in 2010 was 635 Gb/insq yielding a 44% GAGR which over 54 years would be 3 times what would have been achieved with a ML CAGR. Furthermore connecting the endpoints tends to understate the CAGR that would be achieved by a best fit straight line. Thus, I think the evidences supports characterizing long term HDD CAGR as somewhat or slightly higher the the Moore's Law CAGR. Tom94022 (talk) 01:36, 29 May 2014 (UTC)
Without careful accounting, the goalposts on the Santa Teresa hills could become “unmoored.”
Moore's law as defined originally: “doubling every two years” or 41% CAGR
Goalpost moved: Whyte(2009) Moore's law line has 34% CAGR
This won't work: Moore's law was and is "doubling every two years” by definition.

Comparison as defined originally: CAGR (% per year)
Goalpost moved: Ratio of the areal densities, compounded over 54 years. (1 + 42%, which is 1% more than Moore's law)^54 / (1 + 41%)^54 - 1 = 46% greater than the base case (a unit-less ratio)
It would be unfair and inaccurate to move to density ratio instead of CAGR, like looking at distance rather than speed. The two cannot be compared numerically, because they do not share the same units: CAGR has units of inverse time (% per year), but the ratio is unit-less (no units at all). It's gibberish, and magnifies a small or non-existent difference in slope into a huge difference in areal density over "just" half a century. Could one compare the slope in angular degrees/radians of El Capitan to the height in meters of Mauna Kea, or Al Shugart's bar tab in dollars to Larry Ellison's yacht length in feet? (Still, I've heard Shugart's was bigger.)

Comparison as defined originally: Whyte(2009), 400 Gb/insq
Goalpost moved: Unearthed new 2010 density data from that same IBM Almaden research facility: 2009 areal density of 520 Gbit/in2 and 2010 density of 635 Gbit/in². This would boost CAGR slightly, (635 / 400)^(1 / 54) – 1 = 0.9% per year. Less than one percent, but every point counts if the customer buys into compounding over half a century.
No, foraging for new data is not acceptable. There may be tasty data exceeding 42% CAGR to be picked on the Almaden foothills overlooking the cherry orchards on Cottle Road that gave birth to the hard disk drive. Jim Porter likely was present at the delivery. Regardless, the fair comparison as defined originally is Whyte(2009) 400 Gbit/in2.

Comparison as defined originally: Use two endpoints, or fit to all the data points by least-squares instead? Actually, this wasn't really nailed down in the first place. But, the two 42% CAGR endpoints aren't tasty; these cherries might not be ripe.
Goalpost moved: Fit instead with least squares instead, because just looking at the two endpoints could understate CAGR. Extracting the Whyte(2009) data would surely be tedious, so this fallback strategy should stay hidden among the Blossom Valley trees, never put to a test.
Regardless, I'd rise to the challenge and look at a least-squares fit. Could you Photoshop extract the Whyte(2009) data points which lie between the two fixed endpoints (year 1956, density 0.002; and year 2010, density = 400)? I'd extract by hand, fit CAGR with least squares, and compare fairly and accurately Whyte(2009) versus Moore's law “doubles every two years.”

The long-standing (prior to 25 May) phrase was "not substantively different"; (over-) extended on 00:15, 25 May 2014 to "somewhat higher". It would save me work and you Photoshopping, if we meet in the middle with “not substantially/substantively different” or just “similar to”.71.128.35.13 (talk) 18:54, 29 May 2014 (UTC)

We agree that Whyte's graph depicts an HDD long term AD CAGR of 42% which is 2.4% greater) than 41%.
We agree that if one uses actual endpoint data for the same period the HDD LT AD CAGR is 44% which is 7.3% greater than 41%
We agree that using actual IBM data for a slightly shorter period the HDD LT AD CAGR is 46% which is 12% greater than 41%
I have seen no numbers that suggest the HDD LT AD CAGR is less than or equal to 41%. Before all this analysis I was willing to return to the original “not substantially/substantively different” but you reverted that language insisting upon evidence. After this analysis the evidence does suggest to me that "somewhat/slightly higher" is more accurate so I am reluctant to return to the less accurate original language. Frankly the new language does place the technical achievement of the HDD industry in an interesting light - exceeding Moore's Law for a longer period of time than the semiconductor industry has tracked Moore's Law is notable. Tom94022 (talk) 05:17, 30 May 2014 (UTC)
One more small point, Whyte's article is dated 2009 so his last data point is 2009 not 2010 giving a CAGR of 43.4% again somewhat/slightly higher than 41%. I suppose I can change the date range in the article to 1956-2009 Tom94022 (talk) 05:58, 30 May 2014 (UTC)
Here, you will see numbers which do indeed show that the HDD long term areal density CAGR is, in fact, less than or equal to Moore's law doubling every two years. You sought to fit longer term data by least squares, and you cited 1956-2010 IBM(2010) Almaden data from Fontana Jr., Decad and Hetzler. The very same researchers, Decad, Fontana and Hetzler (IBM(2013)), have released data through year end 2012. IBM(2013) is found here: http://www.digitalpreservation.gov/meetings/documents/storage13/GaryDecad_Technology.pdf
Year end 2012 = 750 Gbit/in2
Year end 2011 = 750 Gbit/in²
Year end 2010 = 635 Gbit/in2
Year end 2010 is identical to IBM(2010), which also reports 635 Gbit/in2
Year end 2009 = 530 Gbit/in2
Year end 2008 = 380 Gbit/in2

Longer term 1956-2012, looking at just the two endpoints,
Date=1956 Interval (years) = 0 Density = 0.002
Date=2012 Interval (years) = 56 Density = 750000
Just the two endpoints gives CAGR = 42.3%

All the IBM data (including the earlier IBM data from Whyte(2009) and http://media.bestofmicro.com/,7-V-303547-3.jpg) are shown below. Year code 1956.68 is September 4, 1956 when RAMAC was introduced. Year code 2013 is the end of year 2012.

1956.68 2.0E-06
1957 2.3E-06 1977 3.9E-03 1997 1.9E+00
1958 4.7E-06 1978 5.1E-03 1998 3.6E+00
1959 9.8E-06 1979 6.3E-03 1999 6.9E+00
1960 2.0E-05 1980 7.7E-03 2000 1.5E+01
1961 3.5E-05 1981 9.5E-03 2001 3.2E+01

1962 4.7E-05 1982 1.2E-02 2002 6.8E+01
1963 6.3E-05 1983 1.4E-02 2003 8.1E+01
1964 8.4E-05 1984 1.8E-02 2004 9.8E+01
1965 1.1E-04 1985 2.2E-02 2005 1.3E+02
1966 1.5E-04 1986 2.6E-02 2006 1.8E+02

1967 2.0E-04 1987 3.2E-02 2007 2.3E+02
1968 2.7E-04 1988 4.0E-02 2008 3.1E+02
1969 3.7E-04 1989 4.9E-02 2009 3.8E+02
1970 4.9E-04 1990 6.0E-02 2010 5.3E+02
1971 6.6E-04 1991 7.7E-02 2011 6.35E+02

1972 8.9E-04 1992 1.3E-01 2012 7.5E+02
1973 1.2E-03 1993 2.1E-01 2013 7.5E+02
1974 1.6E-03 1994 3.5E-01
1975 2.1E-03 1995 5.9E-01
1976 2.9E-03 1996 1.0E+00

Now, fitting to all these data points by least squares would be more fair and more accurate than just fitting to the endpoints, and this has been done:
Areal density least squares fit using all 58 data points, CAGR = 40.66%
Recall that Moore's law is defined as (sqrt(2) – 1) = 41.42% per year CAGR

In conclusion, HDD slope is 40.66% per year and Moore's is defined as 41.42% per year. Nonetheless, I'd say they are similar and statistically indistinguishable.

By the way, at least two of the recent IBM data from 2010 and 2012 are just laboratory demonstrations, not shipping products. IBM RAMAC in 1956 was a product. It would not be fair to compare a product like RAMAC with a lab demo. This lab_demo versus real_product confusion is widespread. According to, http://www.storagenewsletter.com/rubriques/market-reportsresearch/ihs-isuppli-storage-space/
“in 2010, the highest areal density that could be achieved for a platter amounted to 550Gb per square inch.” This compares to 635 Gbit/in2 from IBM(2013). The lab_demo/product ratio is 635/550 = 1.15
According to Seagate's press release in 2012 http://www.seagate.com/about/newsroom/press-releases/terabit-milestone-storage-seagate-master-pr/
early 2012 saw 620 Gbit/in2 products. This compares to 750 Gbit/in2 given by IBM(2013). The lab_demo/product ratio is 750/635 = 1.18
A demo/product correction factor of 1.16 will be used. This reduces the slope a bit (less than one-half of one percent), just looking at the two endpoints as follows:
Date=2012 Interval (years) = 56 Density = 750000/1.16 CAGR = 41.9%
Date=1956 Interval (years) = 0 Density = 0.002
Date=2012 Interval (years) = 56 Density = 750000 CAGR = 42.3%

Restatement of conclusion: Areal density CAGR during 1956-2012, looking at real products and excluding laboratory demonstrations, is slightly less 40.66% per year, not substantially/substantively different than Moore's law 41.42% per year (doubling every two years). No statistical evidence has been presented to support the claim that the HDD as compared to Moore's slopes have a "statistically significant" margin of difference.71.128.35.13 (talk) 19:46, 30 May 2014 (UTC)

May I suggest you are not only unmooring the goalpost when you extend the interval beyond 2010, you are also venturing into original research when you concatenate different data sources, extract data points from graphs, adjust data points and then attempt to draw conclusions therefrom. First point is that we know that since about 2005 HDD AD CAGR has been less than 41%/year so each year you extend beyond Whyte will reduce the LT CAGR - why not go all the way to May 30, 2014, to make a point? More importantly your ability to extract accurate data from these low resolution graphs is limited so one can only conclude that your analysis fails to prove your hypothesis that the LT HDD AD CAGR is less than Moore's law. The endpoint data on the other hand are accurate. If you want to get the actual data and run it thru 2010 (or 2009) then you might have a point. Whyte data show a LT AD CAGR exceeding ML from 1956 to 2010 and the endpoint data confirm it to 2009 and 2010. Other accurate endpoint tests also confirm LT AD CAGR exceeding ML from 1956 to 2003. We have a confirmed reliable source for the LT HDD AD CAGR (1956-2010) exceeds ML; if u can find a reliable source that says otherwise then you might have a point. Tom94022 (talk) 02:40, 31 May 2014 (UTC)
One last point about yr data - apparently u have added points from other trend lines, e.g., 1957 to 1961 - after RAMAC in 1956 I think the next data point might be 1962 or so. This is statistically invalid as only the actual data points should be used to fit a straight line. Tom94022 (talk) 02:52, 31 May 2014 (UTC)
You point to an “hypothesis that the LT HDD AD CAGR is less than Moore's law.” But, this never had to be proven, and the evidence for this is no better than the evidence for the reverse. Both ways, it's too fuzzy to call.

Only the following is disputed: firstly, the original HDD article before 25 May had “not substantively different from Moore's Law.” Secondly, the new version after 25 May claims “somewhat higher”. The issue isn't the value, positive or negative, of the difference. It's simply that there is no statistical justification to distinguish between the two, one way or the other. Though I'm still open to looking at your data, as you wrote, “If u dispute this I will be happy to Photoshop the graph to prove the point.”

Your deconstruction of the data shows that the slopes have a lot of slop. Moore's law is exact by definition only, not in reality. As Gordon Moore (1995) wrote of Moore's law, “I did not expect much precision in this estimate.” I'd put Kryder's law (1990-2010 or even 1956-2013) at 25-100% per year. Not 40.66% per year. Around 40% per year historically would even be fine by me. The author of the original version of the article set the bar, long before 25 May, pretty low. Where it should be. It's easy to demonstrate insufficient support for a claim, but harder to prove it. Given the weak support for the aggressive claim of distinguishability, it would be prudent to return to the long-standing uncontroversial formulation.71.128.35.13 (talk) 18:19, 31 May 2014 (UTC)

Actually it is your hypothesis that the LT HDD AD CAGR is less than or equal to Moore's law that has to be proven since the evidence from a reliable source, Whyte, is that over the time period 1956 - 2009 it exceeded ML. So far you have only confirmed Whyte. I have a longer response in mind but I am pretty busy this week so I probably won't post it until next weekend. Tom94022 (talk) 16:14, 2 June 2014 (UTC)
BTW, I take it you agree that your analysis of "All the IBM data ..." is statistically flawed; if so may I suggest u delete it or at least strike it? Tom94022 (talk) 16:20, 2 June 2014 (UTC)
I'd delete those (few) early years with missing data, and fit by least squares again. The results wouldn't change drastically, I expect. I'll try later, when I've time.
Your claim is that the two slopes are significantly, in the statistical sense, different. (“somewhat higher”) Mine is that there is insufficient evidence to show that long term Kryder's law and Moore's law (both fuzzy in the real world) slopes are different. (“not substantively different”) That's the text before 25 May.

I oppose the change of 25 May, because I've no good evidence to support a “significant” difference or “somewhat higher” slope. The difference in slope (pos or neg) could go either way.71.128.35.13 (talk) 03:49, 3 June 2014 (UTC)

Whyte has sparse data points from 1956 thru 1992, everything else should be eliminated. Then from 1992 thru 2004 many of his data points overlap which will make determining their values difficult. Good luck - I expect when u are done the best fit straight line will have a slope greater than ML - all u will do is confirm what is visually presented. Tom94022 (talk) 05:43, 3 June 2014 (UTC)
You point to sparseness of Whyte(2009) data for the early decades, and by extension the sparseness of IBM, Grochowski and most of the available the density data. Grochowski (2003)
http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/grochowski.pdf
shows rapid density acceleration in the 1990s and very steep acceleration before the mid-1960s when the industry was just starting. This is the same density story as Whyte(2009).

I do take areal density with a grain of salt, and not just because of the lab_demo versus shipping_product confusion. The data sources are few, because IBM/HGST produced much of the storage and have disseminated their version of the story widely for decades. Therefore density is difficult to corroborate independently. Density is a technical parameter, a step removed from real prices. Importantly, prices have more economic relevance and reality to users, buyers, the producer price index, labor productivity and the national GDP. Density is less generalizable than price: it's harder to measure the areal density of a ferrite core or a flipflop or a DRAM. Density could be another rabbit hole (or more commonly a squirrel hole) atop the Santa Teresa hills near the green Almaden Research oasis. Prices, by the same token, would be more realistic if they were adjusted for quality. But this is what we've got.

No worries. Since there are quite enough data, I won't need a lot of luck. McCallum(2014) has independent retail-level (or list price in the early years) prices that go back decades.
http://www.jcmit.com/disk2014.htm
http://www.jcmit.com/diskprice.htm

The magnetic storage price trend post 1980 slope parallels semiconductor Moore's law (semiconductor is flash since around 2007, preceded by DRAM, preceded by older forms of rapid-access-main_memory devices like flipflops and ferrite core) price slope. During the 1980s and again now, HDD prices were one order of magnitude better than the closest DRAM/flash semiconductor alternative. Theses two slopes are parallel. The non-magnetic price slope holds all the way back to the 1950s, around -38%/year (1957-2014). HDD prices improved very rapidly by -55% per year (not quite -60% to -100% per year) from the early-1990s to early-2000s. Prices continued to improve at the very strong pace of -47% per year during 2000-2009. HDD prices since January 2010 (two years before the floods) to now (two and a half years after the floods) have improved by only 10% per year.

Conclusion: during the last three decades, the HDD price (not density) slope has not been substantively different from the semiconductor price slope. Prior to the mid-1970s, magnetic storage price improvement lagged. Particularly since 2010, the HDD price slope around -10% per year has been very much slower than flash memory (Moore's law) slope around -36% per year. In the last year or so HDD prices improved (dropped) by slightly better than 20%. Long term over five decades, I've still no good evidence to support a “significant” difference or “somewhat higher” slope for HDD price progress over Moore's price progress. The difference (pos or neg) could go either way.

I dispute the new "somewhat higher" slope claim of 25 May.71.128.35.13 (talk) 20:41, 3 June 2014 (UTC)

WP:3 has been implemented for dispute resolution.71.128.35.13 (talk) 19:33, 6 June 2014 (UTC)

Even Plumer supports an AD CAGR greater than Moore's law from 1955 thru 2005 - let me spell it out for you:

  • "Compound Annual Growth Rates (CAGR)of about 40% in the first 35 years."
  • "The growth in CAGR from 40% to 60% to 100% which began in the mid 1990s and spanned the following several years (Fig 1)Fig 1 depicts 100%/year from 1998-2002

Whyte shows the 60-100% CAGR extending from the early 1992 to 2004. If u combine 38 years of 40% with 12 years of 60-100% you will get about 49% per year which well exceeds Moore's Law. My quick and dirty calculation says that it takes only two years of 80% per year to raise 40%/year to 41.4% over a 50 year span. So given the periods of very high growth rates it seems obvious that HDD AD CAGR had to exceed ML (at least until very recently). You seem to prefer Plumer over Kryder - in 2005 Plumer was an engineer at Seagate while Kryder was Seagate's Chief Technical Officer and Senior Vice President, Research and University Professor of Electrical and Computer Engineering, Carnegie Mellon University. I think Kryder quoted is at least as reliable a source as a Plumer paper. Finally Plumer's approximately 40% only covers the first 35 years and not 1955-2005 so it doesn't contradict any statement about such a longer period, particularly since he agrees that for many years there after the rate substantially exceeded a Moore's Law rate. Tom94022 (talk) 09:15, 25 July 2014 (UTC)

You have cited Walter (2005), but this remains unverified. You say Kryder is an expert. While correct, this is irrelevant. Kryder did not write the headline nor the Walter (2005) article. The 5 decade interval is 1956–2010; no reason to switch to 2005. You cry Kryder, Kryder, Kryder, but what does Kryder say in the Walter (2005) citation with respect to the five decades? Nothing, specifically. Where is the “Kryder quoted” to which you refer? Nowhere, exactly. So the question remains: which part of Walter (2005) supports the five decade claim?

Your parsing of Plumer (2011) represents original research WP:OR. I need not spell it out for you, instead Plumer will: “In order to achieve the approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years, several key technology innovations have been employed." Here, Plumer says 40% and 50 years. You now characterize Plumer as a low-level engineer, and move from an appeal to Kryder's authority to an ad hominem attack. Regardless of organizational rank, Plumer, a magnetic storage engineer in 2011, is a more reliable source for 1956–2010 technology assessment than Walter, a writer in 2005.

Plumer did not say 49%; you do, using WP:OR methods. But in view of the high degree of statistical uncertainty, even your WP:OR fabricated 49% is similar to the Moore's law rate. Not “higher than;” not “well exceeds.” Plumer's “approximate 40%” has one significant digit, and does not disagree with 49%. It's not precisely 40.00%. Moore's law is so fuzzy that some quote a 24-month doubling time and others 18 months.

“Once more unto the breach, dear friends, once more,” to quote some writer. Moore's law progress was similar to areal density growth during 1956–2010. It stands once more, twice more, a dozen times more, at least for a few years until data for 1956–2016 are available. The Wikipedia article was correct prior to your 25 May edit: “not substantially different than the 40% per year Moore's Law growth.”

“Or close the wall up” (the very next line from that same passage) with a reference that can stand up to verification, not with job titles from a Rolodex.71.128.35.13 (talk) 20:14, 26 July 2014 (UTC)

  1. An interview by Walter of Mark Kryder published in Scientific American is a reliable source and does not require verification. Nor is it necessary to have specific quotes; Walter's paraphrasing of Kryder is sufficient. Unless u can find something by Kryder disavowing Walter there is no reason to exclude it nor not attribute it to Kryder.
  2. Routine calculations do not count as original research, provided there is consensus among editors that the result of the calculation is obvious, correct, and a meaningful reflection of the sources. You will probably not agree that my calculations are routine, etc. even though you have used such calculations in your several failed attempts to show a CAGR <41%. Perhaps some other editors will join the discussion
  3. Apparently you agree that in some contexts 49% is approximately Moore's law, so that u must agree than approximately 40% can NOT disprove an hypothesis that the CAGR of areal density was greater than Moore's law thru at least 2006. In which case u should stop reverting or changing on the basis of Plumer.
  4. Moore's law is not fuzzy - doubling every 24 months = 41.4% CAGR. The observed data trend is either greater than or less than that number. Whyte has about 60 data points - unfortunately we don't have the underlying data but we must presume good faith. AD is known with great precision and accuracy, shipping date is less accurate, but for you to say there is no statistical difference between the Whyte trend and ML is sophistry.
  5. The reason to cut back the observation from Whyte to 2005 or 2006 is that where there is the latest clear kink in the curve; but note even extended to 2010 Whyte AD is above ML. The point is that since the mid 2000s HDD AD has not been progressing as fast as it had done so in the past. I think this is the message the reader needs to hear.
Actually this whole thing started when u, without explanation, removed all reference to Moore's Law and then refused to accept it until confronted with Whyte among others. I think I actually wrote the original phrase, "not substantially different" but only after u forced a reference did I find the evidence that HDD AD has actually out-performed ML over a long period of time, at least until recently. Perhaps other editor's seeing this now the posted graphic from Whyte will help them see this also.

 

BTW, please don't argue that my annotation of Whyte is original research - it's graphically what u did in one failed attempt rebut the hypothesis that HDD AD CAGR > Moore's Law thru at least 2006.Tom94022 (talk) 22:38, 26 July 2014 (UTC)


On the other hand, the ability of the magnetic disk people to continue to increase the density is flabbergasting--that has moved at least as fast as the semiconductor complexity.

Gordon Moore, PC Magazine March 25, 1997

Given that the HDD AD continued to double annually for 5 or more years after this quote, it does lend qualitative support to an HDD AD CAGR > Moore's Law thru the mid-2000s. Tom94022 (talk) 00:54, 27 July 2014 (UTC)
BTW, the policy of no original research does not apply to talk pages; I offered my off the cuff calculations in the hope they might lead so a consensus not for inclusion in the article. Tom94022 (talk) 04:53, 27 July 2014 (UTC)
In conclusion, putting together Moore, Kryder and Whyte gives reliable sources for:

During its first fifty years (1956 – 2006) HDD areal density increased at a flabbergastingly rapid rate, likely exceeding the 41% compound annual growth rate (CAGR) of Moore’s Law but the growth rate decreased substantially thereafter and most recently the CAGR has been in the range of 8-12%.

Proposed lede for Future development section

Tom94022 (talk) 17:06, 27 July 2014 (UTC)
The proposed lede is wrong, very wrong. The areal density on the Whyte graphic is “similar to” not “likely exceeding” Moores law (41.4% per year). There is no “flabbergastingly rapid” rate, if one considers areal density in the context of Moore's law. This breathless prose, this hype, violates WP:NPOV. The areal density slowdown began in the early 2000s, not the late 2000s. Let's look under the covers of a few pieces of the sophistry below.

Illka Tuomi, who was the Chief Scientist at a large company, has shown that Moore's law has very large error bars. http://firstmonday.org/ojs/index.php/fm/article/view/1000/921

The graphic brings 42 observable data points that begin in 1956 and end in 2009, and has a green line for 41% per year Moore's law slope. The caption states this interval is 53 years. You are leading us to believe that the blue areal density trend is “somewhat higher than” the slope of the green line. But is it really higher?

Let's see what this graphic actually says. An image processing routine has found the value of each data point, and they are listed below.

Year, Areal density (Gb/in.sq)
1956.8 2.06E-6 1992.9 2.75E-1 2001.4 2.54E+1
1962.6 5.12E-5 1993.5 3.82E-1 2001.8 2.71E+1
1964.5 9.86E-5 1994.5 5.30E-1 2001.5 3.53E+1
1965.6 2.16E-4 1995.0 6.46E-1 2002.9 4.59E+1
1970.6 8.04E-4 1995.9 8.40E-1 2003.0 5.96E+1

1973.6 1.55E-3 1996.5 1.42E+0 2003.1 5.23E+1
1975.5 3.19E-3 1997.2 1.51E+0 2003.9 8.83E+1
1979.9 7.98E-3 1997.4 2.56E+0 2005.1 1.15E+2
1982.3 1.26E-2 1997.8 3.12E+0 2006.3 1.31E+2
1985.2 2.28E-2 1998.1 3.79E+0 2007.9 1.59E+2

1987.9 3.85E-2 1998.3 4.33E+0 2008.5 2.44E+2
1990.0 6.51E-2 1998.8 5.27E+0 2009.8 3.07E+2
1991.5 1.03E-1 1999.0 6.00E+0
1991.8 1.43E-1 1999.8 1.08E+1
1992.7 1.63E-1 2000.5 1.83E+1

While both the graphic and simple arithmetic support “at times greatly exceeded Moore's law growth” this claim is vague and has no upper limit. Fitting routinely by least squares indicates 86% per year for 1995–2000, and the graphic confirms that the rate of progress reached 60–100% then.

The areal density slope fitted routinely by least squares is 40.9% per year, for all 42 observed data points, as charted here: http://postimg.org/image/8micd1wyl/ The slope is 40.5% per year for the 1956–2006 subset.

We are led to conclude that density growth during 1956–2009 (and 1956–2006) was similar to the Moore's law rate of 41.4% per year. The data show the slowdown began in the early 2000s, not the late 2000s.71.128.35.13 (talk) 21:04, 27 July 2014 (UTC)

You continue to confuse a trend line with the actual data points; any given data point may be above or below a trend line. It is indisputable that the actual growth rate for AD from 2005-2009 when measured from the shipment of the prototype RAMAC to that of the state-or-the-art AD exceeds an annual compound rate of 41.4% for each of the years 2005-2009 as shown in Whyte. It is true thereafter using the data points from other sources. That is shown by the graphic and is confirmed by your data.
It is not reasonable to draw your precise conclusions from your trend line analysis of Whyte since the data points are likely to be imprecise due to if nothing else error introduced by translation of these points into an image and compressing it into a jpg. For example, the RAMAC point:
' Date Areal Density
Ramac Actual 1956.70 2.00E-06
Whyte 2 by me 1956.82 2.02E-06
Whyte 1 by me 1956.30 1.99E-06
Whyte by IP 1956.80 2.06E-06
Each of your derived data points suffer from such errors so that any trend line is at best approximate. It is also original research in that there is no consensus that this is reasonable transformation. Regardless, any trend line is not data and so irrelevant to this question
Wikipedia is not a copying machine nor do we editors necessarily have to precisely describe a reference. 2005 or 2006 is clearly a breaking point and therefore it is acceptable even desirous to reflect that date in the section lede. The only reason I can see to break at 2009 or 2010 per the end of Whyte is to extend time giving a slightly lower CAGR from the beginning (slightly lower but still >ML CAGR), which while true is not particularly helpful to the reader. Dividing the times discussed into 1956-2006 and then 2006 to present makes good sense in the context of Whyte and many other reliable sources.
Although there is a reliable source for "flabergastingly" it is not necessary so I will move it to a foot note but you have yet again failed to provide any evidence beyond your point of view and your original research so I will again change the lede along these lines. Tom94022 (talk) 19:21, 5 August 2014 (UTC)
Ignorance of least squares regression is not an option if one is make a convincing quantitative assessment of dissimilarity. The technique is well described in Wikipedia. Rhetoric and endpoints won't do. Because each point has measurement error, least squares regression includes all the data points, not just the endpoints. Even if one were to (incorrectly, sloppily, perhaps even deceptively) cherry-pick the endpoints (1956-2009), this would still be 43% per year: similar to Moore's law. It is not reasonable to exclude those 40 data points between the endpoints. The slope of all the data points (40.9% per year) is similar to Moore's law.

At the same time, Tuomi showed that Moore's law has very large error bars. His article is almost novella-length at 14,000 words, but education isn't free: often it takes a substantial investment of time. Tuomi published in a peer-reviewed journal, not a blog or company-sponsored marketing slides.
TUOMI, Ilkka. The Lives and Death of Moore's Law. First Monday, [S.l.], nov. 2002. ISSN 13960466. Available at: <http://firstmonday.org/ojs/index.php/fm/article/view/1000/921>. Date accessed: 05 Aug. 2014. doi:10.5210/fm.v7i11.1000. The abstract follows:

Moore's law is fuzzy indeed, and areal density has grown at a rate similar to Moore's law over the span of more than five decades as noted explicitly by Plumer (2011).
71.128.35.13 (talk) 22:40, 5 August 2014 (UTC)
I'm not sure of the relevance of your lecture on least squares regression and your citation to Tuomi. You continue to misuse trend lines instead of the actual data points. BTW I can't find any error bars in Tuomi; he shows that the data points have very large deviations (errors) from the trend line, not the other way around. A compound growth rate on semi log paper is a straight line. A trend line might have a bound if one entered the data points with their error bars and ran some sort of Monte Carlo analysis to establish confidence intervals. But this would still be irrelevant to the simple mathematical statement that 43 > 41.4.
You agree that the AD CAGR from RAMAC to 2009 is 43%. I am sure u will agree that it is greater than 43% for each of the years 2005 thru 2007. Surely you cannot deny that 43 >< 41.4. There is very little error in the two points, mainly in the dates so the difference between 41.3 and >43 is significant. Since we appear to be in agreement, that should end this discussion. Tom94022 (talk) 03:26, 6 August 2014 (UTC)

No agreement could be apparent to any rational observer. Clearly, this is disputed material. While there is a desire to terminate the discussion unilaterally, the desire is unrealistic and termination would be shortsighted. To paraphrase wikipedia guidelines, no agreement is required; consensus does not mean unanimity which is not always achievable; nor is it the result of a vote. Further discussion, however unwelcome on the part of certain editor(s), may provide future editors with insight into the two alternatives: density growth over five decades that was “similar to” or “somewhat exceeding” Moore's law.
I can and do deny that "43 < 41.4"; because, mathematically this is wrong. In fact given the uncertainties that limit our confidence in this comparison, 43 is approximately equal to 41.4, and the figure of 41.4 should be seen as 40-ish not as precisely 41.421356237. Furthermore, 43 just relies on two endpoints: linear regression using all the data points gives 40.9 which is 41-ish or 40-ish.
In the first place, the slope of Moore's law itself is fuzzy according to Tuomi. It's not 41.4214% per year.The Lives and Death of Moore's Law – By Ilkka Tuomi
Secondly, Plumer (2011) found that areal density grew about 40% per year over the past 50 years: “In order to achieve the approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years, several key technology innovations have been employed." [1]
Thirdly, the linear regression slope of areal density for all 42 data points from the graphic (Whyte) is 40.9% per year, similar to Moore's law.
Fourth and finally, Marchon (2013) indicates that areal density grew at “the historical, Moore's law equivalent of ~40%/annum.”[2] To repeat: “the historical, Moore's law equivalent of ~40%/annum.” With emphasis added to make this even more apparent: “Moore's law EQUIVALENT” according to Marchon of HGST and co-authors Pitchford of Seagate and Hsia of Western Digital.
Therefore according to multiple credible sources, namely Tuomi, Plumer(2011), the Whyte graphic, and very particularly and specifically Marchon(2013), areal density grew at a rate similar to (“equivalent” according to Marchon et al.) Moore's law over the interval of more than five decades. 71.128.35.14 (talk) 18:08, 6 August 2014 (UTC)

I think 43 < > 41.4 is obvious from the data underlying 43 but since you assert there are uncertainties please state what u think are the uncertainties in your 43% and I will do a worst case analysis to show the lowest possible bound. Addressing your points by number:
  1. You misstate the implications of Toumi. The slope of Moore's Law is not fuzzy; the performance of the semiconductor industry has deviated from a "Moore's Law" and Moore's Law has not always been doubling in 24 months. He has nothing relevant about the CAGR of HDD AD from RAMAC to any specific date.
  2. Plumer's approximately 40% could be an actual number as high as 44.5% which again has no relevance to the actual number.
  3. You continue to confuse a trend line with the actual data. Furthermore, your trend line derived from Whyte has inherent noise you introduced that does not allow any meaningful comparison to 41.4....
  4. Marchon's “the historical, Moore's law equivalent of ~40%/annum” is not qualified as to time, just an ambiguous historical. The only graphic goes only to 1990. Since the period he is referencing is unknown it is your unsupported assertion applies it back to RAMAC.
Accordingly, none of your multiple credible sources, namely Tuomi, Plumer(2011), the Whyte graphic, and Marchon(2013) deny that the AD CAGR from RAMAC to 2006 was according to you 43%. While it is likely true that a trend line thru the leading edge AD data points to 2006 has a CAGR of about 41.4%, it is also possible that such a trend line may exceed 41.4%. You have not identified such a trend line, your trend line derived from Whyte is fatally flawed for this analysis by your methodology and most important it is not clear that a trend line is anymore relevant than a two point analysis, particularly since any two points of Whyte can be determined with a high degree of precision and none of the points are outliers. Tom94022 (talk) 02:42, 7 August 2014 (UTC)

A factor of 100 error in price improvement (30 billion versus 300 million) was today passed off as fact, and inserted into the article. Note that these errors are never random: they invariably lead in the direction of overoptimism and industry booster-ism.
Logic and mathematics are debased across the board here in the service of sophistry. Reality plays no part in this circus of denial. But numbers really do mean something. The initial claim of “Surely you cannot deny that 43 < 41.4” was refuted plainly “I can and do deny that '43 < 41.4'; because, mathematically this is wrong.” Sticking to your guns while keeping your head buried in the sand, you now claim anew that “I think 43 < 41.4 is obvious from the data ...”
No error, no matter how glaring, is ever admitted or corrected. Are we to believe that 4300000 < 41? How about 2 < 1?
71.128.35.13 (talk) 19:39, 7 August 2014 (UTC)
Thanks for correcting my math error. I seem to recall you reminded me of Wikipedia's policy of WP:CIV civility; please act as you preach. When I make a mistake I correct it as I would have done on the price issue and have done on the "<". Do you deny that 43.0 is greater than 41.4? You really don't respond to anything, just continue to produce more evidence that never supports your position which btw has always been to minimize the areal density growth. But the good news is your edit to the price section seems to be an admission that a two point analysis is meaningful. In the continuing absence of your response I shall shortly post a worst case two point analysis that the CAGR between RAMAC and 2006 is significantly greater than 41.4. Tom94022 (talk) 20:55, 7 August 2014 (UTC)
You may also recall my several earlier responses to this issue. Linear regression must calculate slope from all available valid data points, not just the endpoints while excluding all the data points in-between. This means all 42 points should be included for Whyte, and both of the points for the 300-million-fold calculation. By the way, two point slopes are subsumed by and fully incorporated in the mathematical technicque of linear regression.
Two-point linear regressions are only meaningful if only two points are present. Otherwise, the linear regression should include all available valid data points. Once again, you appear to have overlooked repeatedly my several earlier responses in a continuing misunderstanding of the mathematical requirements of linear regression. As you may recall this calculation has already been performed in a meaningful fashion (not the less-meaningful two-point version), and the slope of all 42 data points in the Whyte graphic is 40.9% per year.71.128.35.13 (talk) 21:29, 7 August 2014 (UTC)
May I repeat your admonishment "to look at the rules at the top of this page ... assume good faith ..." I fully understand linear regression, having used it many time in my career, usually to predict the future. I have repeatedly responded that I question first linear regression's relevance at all to this discussion (see e.g., the assumptions of linear regression) and secondly the usability of your data points derived from Whyte (also an issue in the Duke paper). Your data from Whyte shows that the actual growth from RAMAC to each of the years 2005-2009 exceeded a 41.4% CAGR as does the graphic now posted. You have yet to answer why u think the uncertainties of a two point analysis are such that one cannot drawn any conclusion as to its relationship to 41.4. Let's leave Moore out of this, why isn't your 43% statistically a number larger than 41.4? I will shortly post a proof that in the worst case the CAGR of AD from RAMAC to 2006 exceeds 41.4 by a significant amount. Tom94022 (talk) 22:20, 7 August 2014 (UTC)
You wrote that the growth rate for the 300-million-fold improvement “btw should have been 40% from 1956.42 to 2014.00”. Two decimal points might be overstating the precision of dating here. On 5 August you wrote that Ramac Actual was actually 1956.70. The 1956.42 date does not show up anywhere else, and no source ever dates RAMAC in the first half of 1956. It is always reported in the second half of the calendar year. So based on Ramac Actual in the second half of 1956, the growth rate should be 41% not 40% per year.

One cannot leave Moore out of this for the sake of expediency. Moore's law is very fuzzy, as Tuomi has shown in 14,000 words of detailed explanation.

Rather than answering the rather negative and inconvenient question of why not calculate slope from just two endpoints, I prefer to look at the positive side and will present instead “A Modest Proposal” for why one would use just two endpoints. Two-point forecasts are a well known tool of storage industry marketing professionals, consultants, and sales people. The technique has enormous advantages when applied to the Whyte graphic, because linear regression based on all data points not just the two endpoints restricts one's freedom to manipulate data and rig results.
Just by selecting the “right” endpoints and excluding all of the data points from the middle, the two-point method can deliver almost any slope the user wants. Let's start by considering the Whyte endpoints, 1956–2009. In order to maximize slope over many decades, it is best to select the 1956 start point because density was really quite dismal in the beginning. Renaming the starting point for marketing purposes, to enshrine it in the mind of the reader and set it in stone: RAMAC_Actual–2009 gives slope of 43% per year. Switching to various different endpoints in the 2006–2009 time frame shows that slope holds steady near 43% per year, so one can demonstrate an apparent and false pseudo-stability. Now let's see what would happen if the interval were reduced by six years: with 1956–2003.9 the slope jumps to 45%. Reducing the interval by lopping off six years from the start and looking at 1962.6 (the RAMAC 2 plus?)–2009 would reduce the slope to around 39%.
Linear regression with all the data points gives a single statistical best (meaning smallest margin of error), but not best for marketing, slope estimate of 40.9% per year, a far cry from the flexibility and multiplicity of different slopes offered by two-point forecasts. Experienced, wiley and cunning practitioners of the art consider many alternative endpoint scenarios to maximize the slope, looking for unusual upward spikes near the end or dips at the beginning like RAMAC Actual 1956 in order to cherry-pick a maximum slope.
Two-point forecasting certainly is an essential tool of any storage industry marketing and sales professional who seeks to tailor the data to fit the message, not the reverse. It works just as well on the rapidly growing flash memory market as it does on the stagnant magnetic storage market.
Obviously these mathematical techniques are not as important as setting strategic goals, targeting the right market, crafting a message that resonates with the audience and deploying that message across appropriate media to the customer audience. From the very first days of wikipedia, editors have frequently discussed the issue of being one of the media that can be employed, very cost effectively one might add, to deliver marketing messages. Social media is the hottest trend in advertising. Wikipedia remains relevant today in this context because it should properly be seen as a popular, respected and trusted form of social media.
On a literary note, as indicated above, the satirical hyperbole found in “A Modest Proposal” by Jonathan Swift (1729) relates directly to my intent here.
— Preceding unsigned comment added by 71.128.35.13 (talk) 19:35, 8 August 2014 (UTC)
If you bothered to [re]search you might discover that there a four possible dates for RAMAC, June 1956 when a prototype shipped to Zellerbach, Sept 4, 1956 when the RAMAC was announced within IBM, Sept 14, 1956, when the RAMAC was officially announced (prototypes previously installed) and finally "mid-1957" when production units were to be available. I prefer to use the June 14, 1956 date although there is an argument that the production date is July 1, 1957 +/- 45 or so days. Two decimal digits are appropriate when we have a specific date.
Correct me if I am wrong, but didn't u use a two point measurement in calculating a 40% CAGR of bytes/$ here in spite of having well over 300 data points at the reference you cited, Cost of Hard Drive Storage Space? I looked without success for 40% CAGR at the reference. Tom94022 (talk) 08:57, 11 August 2014 (UTC)


(un-indented)Don't look for any reference regarding 40% versus 41%. This observation is just two points in 2H1956 and YE2013 with a ratio of 300 million, not 30 billion, to one. The math is routine and verifiable. It is both foolish and astoundingly tendentious to pursue 40% instead of 41%, because this difference is not significant like that factor of one hundred pricing error, 300 million versus 30 billion.

Furthermore, we agree that dates in the second half of 1956 are supported solidly, though one date in the first half of 1956 may exist. Even if we were to take into consideration all four (4) date candidates that you now propose, the median date would still be in the second half of 1956. You are one of the select few IBM-specialist magnetic-storage historians, quite distinct from most readers of this article, who would be interested to hear that a conflicting date of 13 September instead of 14 September appears even today in wikipedia: “The IBM 350 disk storage unit, the first disk drive, was announced by IBM as a component of the IBM 305 RAMAC computer system on September 13, 1956.” All these citations are found in the second half of 1956.[3][4][5][6]

I wonder: does the one day difference in date, September 13 versus 14, result from time zone or international date line difficulties? Should this be dated in New York at the corporate headquarters and news media center, or in San Jose CA where they did the work before most readers were even born? Because these minutiae have no bearing on the quality of this article and serve only to distract and bother the editors, it would be counterproductive to over-[re]search the dates to two decimal places as you would have us do.

Would u mind explaining why your original two point calculation of 40% was an acceptable alternative to a trend line derived from the 300+ data points in your reference when you made this edit?
Since we are calculating various CAGRs to one decimal place for purpose of comparison, it is a good idea to use at least two on dates for such calculations. Excel doesn't care and the rounding off to one digit is free. Such precision has nothing to do with what dates appear in Wikipedia articles.
One day doesn't matter in calculations using 1/100th of a year (~+/- 2 days) but [re]search on your part might have discovered the 305/650 announcement was officially released on Sept 14, 1956 but it had likely been distributed to the press at an demonstration earlier than Sept 14 resulting in at least one newspaper printing an announcement on Sept 13. The press conference was on Sept 14. So it is not surprising that there is some confusion in the various sources that carries through to Wikipedia articles. Personally I think the press conference on Sept 14, 1956 is the most reliable date for RAMAC announcement in Wikipedia articles and the nominal date for CAGR calcultions.
All you had to do to find a June reference was click on the link provided. There are many such references all it takes is a little [re]searching on your part, e.g. try searching on "RAMAC Zellerbach". Tom94022 (talk) 18:01, 16 August 2014 (UTC)
Why original calculation was two points and 40%: Sure, this is done to keep the paragraph consistent. If this article showed a graphic of 300 data points and its least squares slope, then this paragraph should state that slope and show that graphic. But, the paragraph lists only two endpoints for each of the following parameters between 2H1956 and 2014: capacity per HDD, physical volume of HDD, weight, price, and average access time. The two point slope is obtained from those endpoints. Price decreased from about US$15,000 per megabyte to less than $0.00006 per megabyte, a greater than 250-million-to-1 decrease. 250E6 ^ (1 / 57.3) – 1 = +40% CAGR — Preceding unsigned comment added by 71.128.35.13 (talk) 21:55, 16 August 2014 (UTC)
I hope the IP will agree that areal density points and trends thereof are plotted with and calculated from production units, not test units or laboratory demonstrations. It seems all of us then have been using the wrong date associated with the RAMAC 350 AD since based upon reliable sources did not ship until at the earliest January 1, 1958, even later than the announced date of mid 1957, see edit here. This further shows that the IP's trend line based upon Whyte is not reliable for comparing to a ML rate and that all two point analyses based upon very hard data will show that in the worst case for any year from at least 2004 to 2014 the HDD AD annualized growth significantly exceeded 41.4%. Since the IP in other cases lacking reliable data was willing to use a two point analysis I hope this hard data will put an end to this endless debate. Tom94022 (talk) 18:56, 18 August 2014 (UTC)
On the contrary, the first shipment to a customer was in 1956, not 1958. "The first delivery to a customer site occurred in June 1956, to the Zellerbach Paper Company, in San Francisco, CA." [7] 71.128.35.13 (talk) 23:08, 27 August 2014 (UTC)
No, I do not agree. This table has two points (today and RAMAC), while the Whyte graphic has 42 data points. However, the table does not discuss growth rate because you removed the footnote after endless debate. I do agree that the footnote should be restored to this table.
The Whyte graphic has 42 data points with slope of 40.9% per year which is similar to Moore's law. I do agree that Barry Whyte of IBM put those data points on his blog graphic, so you should ask Barry Whyte to correct the RAMAC data point on his graphic. Wikipedia is not permitted to revise a reference or photoshop corrections. I oppose fiddling with the graphic until Whyte himself corrects the data point, and publishes the corrected graphic in an openly accessible location. If he were to “correct” the RAMAC data point all the way out to early 1959 (not early 1958), then the “corrected” slope of the 42 points would match Moore's law (doubling every two years or 41.42% per year). 71.128.35.13 (talk) 21:13, 18 August 2014 (UTC)
Please do not move the goalposts again; this talk section is about AD vs ML in Future development section wherein you insist that AD CAGR has been "similar to" ML without any evidence other than your interpretation of Whyte. The are many reasons why your interpretation of Whyte is useless for calculating the slope to one decimal, including but not limited to the errors introduced by converting a jpg to data points, but we now know the first point is wrong and BTW the 2006 point is wrong. So it is not a reliable source for your calculation of a trend line to one-tenth of a percent. FWIW, I don't think its errors are sufficient to preclude its use as a graphic. On the other hand hard reliable data establish that from any RAMAC, date but particularly from the production date, to almost any date this century the AD CAGR significantly exceeded 41.4. You have no reliable source that says otherwise. Tom94022 (talk) 22:25, 18 August 2014 (UTC)

Before anything else, I apologize for jumping into your discussion without providing any actual contributions. In a few words, while (to me, FWIW) it's really delighting to see that there are still people who have have the energy required for going into such details and for doing that over an extended period of time, it's also sad to see all that energy‍—‌please pardon my choice of words‍—‌wasted. Why wasted? Well, please keep in mind that very few of the article readers care about that fine details and involved calculation methods. Just wanted to point that out; please don't get me wrong, I'm not suggesting that any of you two should give up and go away from this discussion. :) — Dsimic (talk | contribs) 21:30, 18 August 2014 (UTC)

It's not so much a discussion as a dialog of the deaf; you can help by taking a position - the IP seems to be willing to stop when two or more editors disagree with his position.
Using hard reliable data it can be shown that over its first fifty or so years from the 1956 RAMAC[a] to the 2006 Toshiba MK2035GSS[b] the areal density of HDDs increased at an annualized rate of at least 44.0%; the nominal rate was 44.2%per year an uncertainty of only a few 10th of a percent. [c] This is significant in that had the areal density increased at a Moore’s law rate of 41.4% the Toshiba drive would have been introduced in 2008 or later rather than 2006. If the earliest production RAMAC date is January 1, 1958, the worst case (lowest) rate is 45.7%. There is little uncertainty in these numbers, a few 10th of a percent. Isn't this proof enough that over the long term HDD AD CAGR somewhat exceeded a ML growth rate?
No, these are just two data points and the dates are fudged. What happened to the original dates of 2009 and 1956? One cherry-picked Toshiba data point, and here the date has been moved/adjusted for no good reason from 2009 to 2006, and one heavily "date corrected" and massaged/adjusted RAMAC point (conveniently moved again, from 2H1956 to 1H1958) don't substantiate the claim that density grew faster than Moore's law. Actually, their growth was similar over five decades (2H1956-2009). It seems one editor will never stop the debate about whether growth was similar to or greater than, regardless of the facts and references that contradict his or her position. Some of those citations are as follows:
1. For this comparison, the slope of Moore's law (whether it doubles every two years, 18 months, 27 months, etc.; whether it measures transistors, linewidths, or components, etc.) is fuzzy according to Tuomi. [The Lives and Death of Moore's Law – By Ilkka Tuomi
This is reference number one. It is 14 thousand words long.
2. Secondly, Plumer (2011) found that areal density grew about 40% per year over the past 50 years: “In order to achieve the approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years, several key technology innovations have been employed." [8] This is reference number two.
3. Thirdly, the linear regression slope of areal density for all 42 data points from the graphic (Whyte) is 40.9% per year, similar to Moore's law. Playing/fudging/manipulating-shamelessly with the RAMAC date (1956 to 1958) won't jack this slope up much higher than 41%. This is reference number three, and this graphic was kindly added here by the Tom94022 editor.
4. Fourth and finally, Marchon (2013) indicates that areal density grew at “the historical, Moore's law equivalent of ~40%/annum.”[2] To repeat: “the historical, Moore's law equivalent of ~40%/annum.” With emphasis added to make this even more apparent: “Moore's law EQUIVALENT” according to Marchon of HGST and co-authors Pitchford of Seagate and Hsia of Western Digital. This is reference number four.
Therefore according to multiple credible sources, namely Tuomi, Plumer(2011), the Whyte graphic, and very particularly and specifically Marchon(2013), areal density grew at a rate similar to (“equivalent” according to Marchon et al.) Moore's law over more than five decades. WP:NOR demands this kind of solid support from real references, not any analysis or synthesis of published material that serves to reach or imply a conclusion not stated by the sources. To demonstrate that you, Tom94022, are not adding OR, you must be able to cite reliable, published sources that are directly related to the topic of the article, and directly support the material being presented. This is the WP:NOR requirement, and Tom94022 violates this rule egregiously. 71.128.35.13 (talk) 00:07, 19 August 2014 (UTC)
Your four points are old arguments. I won't waste anyone's time again stating why they are not relevant to the question, they are fully answered above. Please stop repeating yourself.
The Toshiba citation above is a 2006 hard data point from a reliable source. I started with, preferred and argued for a measurement to 2006 which is about the current breaking point in the curve. But whatever the the data points chosen from from RAMAC to 2004-2010 and beyond, the HDD AD CAGR is always significantly greater than 41.4.
Calculating a CAGR is a routine calculation which may be used in an article as you have done so in the past so I don't think you can seriously dispute all mathematically correct calculations. Original research is allowed on talk pages to help achieve consensus so your original research objection has no merit.
Rather than as u say "cherry picking" dates I have found reliable published sources for each of my date points including all uncertainties in the dates and then used the uncertainties to calculate worst case (i.e. lowest) CAGRs. Since ALL the worst case CAGRs exceed 41.4 it is reasonable to say that the long term growth rate has slightly exceeded a ML rate.
BTW, you have raised the strawman argument of "uncertainties" so it is a bit ingenuous for you to the object when I go as far as finding reliable sources to quantify the uncertainties.
If you have nothing new to say, perhaps it is time to let some other editors in?Tom94022 (talk) 01:33, 19 August 2014 (UTC)
The argument won't change because the facts haven't changed, and neither have the WP:NOR rules. Repetition of the WP:NOR mantra is no vice, and an editor's analysis or synthesis of the data is no virtue (phrasing inspired by Barry Goldwater). Ask not what you can say about the data; ask what the sources themselves say to directly support the material being presented (WP:NOR and JFK).
No reliable published source has said that Moore's law lagged areal density growth historically over five decades. However, Tuomi says that Moore's law is fuzzy (whether it doubles every two years, 18 months, 27 months, etc.; whether it measures transistors, linewidths, or components, etc.) from the start;[The Lives and Death of Moore's Law] Plumer says areal density grew about 40% per year over the past 50 years;[9] and Marchon describes the CAGR of current storage areal density on a disk surface as “the historical, Moore's law equivalent of ~40%/annum.”[2] 71.128.35.13 (talk) 22:52, 19 August 2014 (UTC)


You apparently fail to understand WP:NOR rules. In this talk page you raised a strawman argument of uncertainty in dates and areal densities for which I have found reliable sources as to RAMAC areal density and dates and as to the 2006 state-of-the-art areal density and dates. I then performed a routine calculation for both the nominal CAGR and the worst case (lowest) CAGR which is always allowed on talk pages and can be used in an article with consent. You have routinely performed such calculations and posted them to articles without obtaining consent so I don't think you can now object to the calculation per se, either here or in the article. These calculations show that in the worst case the 1956-2006 AD CAGR exceeds 41.4%, one of the several and most common expressions of Moore's Law. I could do similar calculations for any year from 2004 to at least 2009, starting in 1956 or 1958, Nothing you have said rebuts or in any way contradicts this.
If you have nothing new to say, perhaps it is time to let some other editors in?Tom94022 (talk) 19:06, 24 August 2014 (UTC)

(un-indenting) Here, I'm not citing an editor's calculation of slope. Instead I rely on authoritative references (Marchon, Tuomi and Plumer) who indicate that the rate of density improvement over five decades was similar to Moore's law. 71.128.35.13 (talk) 21:32, 24 August 2014 (UTC)


  1. ^ RAMAC 2.000 E-06 AD, earliest date June 1, 1956, nominal date Sep 14, 1956, early production date Jan 1, 1958
  2. ^ Toshiba 1.788 E+02 AD, late date Aug 31, 2006, nominal date Aug 15, 2006
  3. ^ The only significant uncertainty is in the date of shipment
  1. ^ Plumer, Martin L.; et al. (March 2011). "New Paradigms in Magnetic Recording" (PDF). Physics in Canada. 67 (1): 28. Retrieved 18 July 2014. approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years … growth in CAGR from 40% to 60% to 100% which began in the mid 1990s and spanned the following several years {{cite journal}}: Explicit use of et al. in: |last= (help)
  2. ^ a b c Marchon, Bruno; Pitchford, Thomas; Hsia, Yiao-Tee; Gangopadhyay, Sunita (2013). "The Head-Disk Interface Roadmap to an Areal Density of Tbit/in2". Advances in Tribology. 2013: 1. doi:10.1155/2013/521086.{{cite journal}}: CS1 maint: unflagged free DOI (link) "the compound annual growth rate has reduced considerably from ~100%/annum in the late 1990s to 20–30% today. This rate is now lower than the historical, Moore’s law equivalent of ~40%/annum" Cite error: The named reference "Marchon2013a" was defined multiple times with different content (see the help page).
  3. ^ "IBM Archives: IBM 350 disk storage unit". 03.ibm.com. Retrieved 2011-07-20.
  4. ^ "CHM HDD Events: IBM 350 RAMAC". Retrieved 2009-05-22.
  5. ^ "IBM Details Next Generation of Storage Innovation". 2006-09-06. Retrieved 2007-09-01.
  6. ^ Preimesberger, Chris (2006-09-08). "IBM Builds on 50 Years of Spinning Disk Storage". eWeek.com. Retrieved 2012-10-16.
  7. ^ Maleval, Jean-Jacques (2011-06-20). "History: First HDD at 55 From IBM at 100 Ramac 350: 4.4MB, $11,000 per megabyte". storagenewsletter.com. Ramac 350: 4.4MB, $11,000 per megabyte ... The first delivery to a customer site occurred in June 1956, to the Zellerbach Paper Company, in San Francisco, CA. {{cite news}}: |access-date= requires |url= (help)
  8. ^ Plumer, Martin L.; et al. (March 2011). "New Paradigms in Magnetic Recording" (PDF). Physics in Canada. 67 (1): 28. Retrieved 18 July 2014. approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years … growth in CAGR from 40% to 60% to 100% which began in the mid 1990s and spanned the following several years {{cite journal}}: Explicit use of et al. in: |last= (help)
  9. ^ Plumer, Martin L.; et al. (March 2011). "New Paradigms in Magnetic Recording" (PDF). Physics in Canada. 67 (1): 28. Retrieved 18 July 2014. approximate 40% compound areal density growth rate that the HDD industry has delivered over the past 50 years {{cite journal}}: Explicit use of et al. in: |last= (help)