Data-centric computing

Data-centric computing is an emerging concept that has relevance in information architecture and data center design. It describes an information system where data is stored independently of the applications, which can be upgraded without costly and complicated data migration. This is a radical shift in information systems that will be needed to address organizational needs for storing, retrieving, moving and processing exponentially growing data sets.^[1]

Background

Traditional information system architectures are based on an application-centric mindset. Traditionally, applications were installed, kept relatively static, updated infrequently, and utilized a fixed set of compute, storage, and networking elements to cope with a relatively small set of structured data.^[2]

This approach functioned well for decades, but over the past decade, data growth, particularly unstructured data growth, put new pressures on organizations, information architectures and data center infrastructure. 90% of new data is unstructured and, according to a 2018 report, 59% of organizations manage over 10 billion files and objects^[3] spread over large numbers of servers and storage nodes. Organizations are struggling to cope with exponential data growth while seeking better approaches to extracting insights from that data using services including Big Data analytics and machine learning. However, existing architectures aren't built to address service requirements at petabyte scale and beyond without significant performance limits.^[4]

Traditional architectures fail to fully store, retrieve, move and utilize that data because due to limitations of hardware infrastructure as well as application-centric systems design, development, and management.^[5]

Data-centric workloads

There are two problems data-centric computing aims to address.

Organizations need to utilize all available data but traditional applications aren't sufficiently agile or flexible. New shifts toward constant service innovation, supported by emerging approaches to service delivery (including microservices and containers) open new possibilities that step away from traditional application-centric mindsets.
Existing limits of data center hardware also restricts complete movement, management and utilization of unstructured data sets. Conventional CPUs are impeding performance because they do not include specialized capabilities needed for storage, networking, and analysis.^[6] Slow storage, including hard drives and SAS/SATA solid state drives over the network can reduce performance and limit data accessibility.^[7] New hardware capabilities are needed.

Data-centric computing

Data-centric computing is an approach that merges innovative hardware and software to treat data, not applications, as the permanent source of value.^[8] Data-centric computing aims to rethink both hardware and software to extract as much value as possible from existing and new data sources. It increases agility by prioritizing data transfer and data computation over static application performance and resilience.

Data-centric hardware and software

To meet the goals of data-centric computing, data center hardware infrastructure will evolve to address massive scale, rapid growth, the need for very high performance data movement, and extensive calculation requirements.

Distributed hardware infrastructures become the norm, with data and services spread across many compute and storage nodes, both in public clouds and on-premise.
Due to the flattening of Moore's law,^[9] new processors are emerging to boost performance, reducing CPU loads by handling intensive tasks including data movement, data protection, and data security.^[10]
New technologies like NVMe drives and networking like NVMeoF will become standard components of data-centric computing architectures.^[11]

As far as software goes, data-centric computing accelerates the disappearance of traditional static applications.^[12] Applications become short-lived, constantly added, updated, or removed as algorithms come and go. Software is redesigned to conduct analysis on all available data instead of subsets. Microservices visit data, conduct calculations and express the results of their process at speeds beyond conventional approaches.

References

^ "The Data-Centric Revolution". TDAN.com. September 2015. Retrieved 2019-12-07.
^ Bhageshpur, Kiran (2016-10-06). "The Emergence Of Data-Centric Computing". The Next Platform. Retrieved 2019-12-07.
^ Bhagheshpur, Kiran. "2018 State of Unstructured Data Management" (PDF). Igneous. Archived from the original (PDF) on July 18, 2020. Retrieved December 7, 2019.
^ "Requirements for Unstructured Data at Petabyte Scale". StorageSwiss.com - The Home of Storage Switzerland. 2019-10-14. Retrieved 2019-12-07.
^ George S. Davidson, Kevin W. Boyack, Ron A. Zacharski, Stephen C. Helmreich, and Jim R. Cowie (April 2006). "Data-Centric Computing with the Netezza Architecture" (PDF). sandia.gov. Retrieved December 7, 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)
^ Why We Need Open, Data-Centric Computing Architectures, retrieved 2019-12-07
^ States, Austin TX United (2016-11-10). "The Network is the New Storage Bottleneck". Datanami. Retrieved 2019-12-07.
^ "Data-Centric Manifesto". datacentricmanifesto.org. Retrieved 2019-12-07.
^ Simonite, Tom. "The foundation of the computing industry's innovation is faltering. What can replace it?". MIT Technology Review. Retrieved 2019-12-07.
^ "DPU: Data Processing Unit Programmable Processor". Fungible. Archived from the original on 2020-08-05. Retrieved 2019-12-07.
^ Kieran, Mike (2019-03-21). "When You're Implementing NVMe Over Fabrics, the Fabric Really Matters". NetApp Blog. Retrieved 2019-12-07.
^ "Microservices Momentum Accelerates". DevOps.com. 2018-05-10. Retrieved 2019-12-07.

[1] "The Data-Centric Revolution". TDAN.com. September 2015. Retrieved 2019-12-07.

[2] Bhageshpur, Kiran (2016-10-06). "The Emergence Of Data-Centric Computing". The Next Platform. Retrieved 2019-12-07.

[3] Bhagheshpur, Kiran. "2018 State of Unstructured Data Management" (PDF). Igneous. Archived from the original (PDF) on July 18, 2020. Retrieved December 7, 2019.

[4] "Requirements for Unstructured Data at Petabyte Scale". StorageSwiss.com - The Home of Storage Switzerland. 2019-10-14. Retrieved 2019-12-07.

[5] George S. Davidson, Kevin W. Boyack, Ron A. Zacharski, Stephen C. Helmreich, and Jim R. Cowie (April 2006). "Data-Centric Computing with the Netezza Architecture" (PDF). sandia.gov. Retrieved December 7, 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)

[6] Why We Need Open, Data-Centric Computing Architectures, retrieved 2019-12-07

[7] States, Austin TX United (2016-11-10). "The Network is the New Storage Bottleneck". Datanami. Retrieved 2019-12-07.

[8] "Data-Centric Manifesto". datacentricmanifesto.org. Retrieved 2019-12-07.

[9] Simonite, Tom. "The foundation of the computing industry's innovation is faltering. What can replace it?". MIT Technology Review. Retrieved 2019-12-07.

[10] "DPU: Data Processing Unit Programmable Processor". Fungible. Archived from the original on 2020-08-05. Retrieved 2019-12-07.

[11] Kieran, Mike (2019-03-21). "When You're Implementing NVMe Over Fabrics, the Fabric Really Matters". NetApp Blog. Retrieved 2019-12-07.

[12] "Microservices Momentum Accelerates". DevOps.com. 2018-05-10. Retrieved 2019-12-07.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]