Surrogate data, sometimes known as analogous data,[1] usually refers to time series data that is produced using well-defined (linear) models like ARMA processes that reproduce various statistical properties like the autocorrelation structure of a measured data set.[2] The resulting surrogate data can then for example be used for testing for non-linear structure in the empirical data; this is called surrogate data testing.
Surrogate or analogous data also refers to data used to supplement available data from which a mathematical model is built. Under this definition, it may be generated (i.e., synthetic data) or transformed from another source.[1]
Uses
editSurrogate data is used in environmental and laboratory settings, when study data from one source is used in estimation of characteristics of another source.[3] For example, it has been used to model population trends in animal species.[4] It can also be used to model biodiversity, as it would be difficult to gather actual data on all species in a given area.[5]
Surrogate data may be used in forecasting. Data from similar series may be pooled to improve forecast accuracy.[6] Use of surrogate data may enable a model to account for patterns not seen in historical data.[7]
Another use of surrogate data is to test models for non-linearity. The term surrogate data testing refers to algorithms used to analyze models in this way.[8] These tests typically involve generating data, whereas surrogate data in general can be produced or gathered in many ways.[1]
Methods
editOne method of surrogate data is to find a source with similar conditions or parameters, and use those data in modeling.[4] Another method is to focus on patterns of the underlying system, and to search for a similar pattern in related data sources (for example, patterns in other related species or environmental areas).[5]
Rather than using existing data from a separate source, surrogate data may be generated through statistical processes,[2] which may involve random data generation[1] using constraints of the model or system.[8]
See also
editReferences
edit- ^ a b c d Kaefer, Paul E. (2015). Transforming Analogous Time Series Data to Improve Natural Gas Demand Forecast Accuracy (M.Sc. thesis). Marquette University. Archived from the original on 2016-03-12. Retrieved 2016-02-18.
- ^ a b Prichard; Theiler (1994). "Generating surrogate data for time series with several simultaneously measured variables" (PDF). Physical Review Letters. 73 (7): 951–954. arXiv:comp-gas/9405002. Bibcode:1994PhRvL..73..951P. doi:10.1103/physrevlett.73.951. PMID 10057582. S2CID 32748996.
- ^ "Surrogate Data Meaning". Columbia Analytical Services, Inc., now ALS Environmental. Archived from the original on February 16, 2017. Retrieved February 15, 2017.
What is Surrogate Data? Data from studies of test organisms or a test substance that are used to estimate the characteristics or effects on another organism or substance.
- ^ a b Hernández-Camacho, Claudia J.; Bakker, Victoria. J.; Aurioles-Gamboa, David; Laake, Jeff; Gerber, Leah R. (September 2015). Aaron W. Reed (ed.). "The Use of Surrogate Data in Demographic Population Viability Analysis: A Case Study of California Sea Lions". PLOS ONE. 10 (9): e0139158. Bibcode:2015PLoSO..1039158H. doi:10.1371/journal.pone.0139158. PMC 4587556. PMID 26413746.
- ^ a b Faith, D.P.; Walker, P.A. (1996). "Environmental diversity: on the best-possible use of surrogate data for assessing the relative biodiversity of sets of areas". Biodiversity and Conservation. 5 (4). Springer Nature: 399–415. Bibcode:1996BiCon...5..399F. doi:10.1007/BF00056387. S2CID 24066193.
- ^ Duncan, George T.; Gorr, Wilpen L.; Szczypula, Janusz (2001). "Forecasting Analogous Time Series". In J. Scott Armstrong (ed.). Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer Academic Publishers. pp. 195–213. ISBN 0-7923-7930-6.
- ^ Kaefer, Paul E.; Ishola, Babatunde; Brown, Ronald H.; Corliss, George F. (2015). Using Surrogate Data to Mitigate the Risks of Natural Gas Forecasting on Unusual Days (PDF). International Institute of Forecasters: 35th International Symposium on Forecasting. forecasters.org/isf. Archived (PDF) from the original on 2021-05-17. Retrieved 2022-07-20.
- ^ a b Schreiber, Thomas; Schmitz, Andreas (1999). "Surrogate time series". Physica D. 142 (3–4): 346–382. arXiv:chao-dyn/9909037. Bibcode:2000PhyD..142..346S. CiteSeerX 10.1.1.46.3999. doi:10.1016/s0167-2789(00)00043-9. S2CID 13889229.
Further reading
edit- Schreiber, T.; Schmitz, A. (1996). "Improved Surrogate Data for Nonlinearity Tests". Physical Review Letters. 77 (4): 635–638. arXiv:chao-dyn/9909041. Bibcode:1996PhRvL..77..635S. doi:10.1103/PhysRevLett.77.635. PMID 10062864. S2CID 13193081.