THE METHODOLOGY FOR SELECTING AND INTEGRATING DATA SOURCES AND USING OFFICIAL STATISTICAL ENTERPRISE DATA, QUESTIONNAIRES, AND PROXY INDICATORS IN FORMING THE EMPIRICAL BASIS OF THE STUDY
DOI:
https://doi.org/10.5281/zenodo.18441828Keywords:
empirical database; data sources; official statistics; firm-level data; surveys; data integration; record linkage; proxy indicators; data quality; metadata.Abstract
This paper develops a methodology for constructing an empirical research database through the systematic
selection and integration of multiple data sources, including official statistics, firm-level data (accounting, managerial,
transactional, and registry records), survey data, and proxy indicators for hard-to-measure constructs. A practical set of
criteria for data source selection is proposed, covering representativeness, accuracy, coverage, temporal and spatial
comparability, data preparation costs, legal and ethical constraints, and reproducibility. The data integration workflow is
described step by step, encompassing identifier standardization and record linkage, harmonization of measurement units
and classifications, panel data construction, metadata management, and version control. In addition, the study outlines
key data quality checks, such as missing data diagnostics, outlier detection, internal consistency analysis, selection bias,
and linkage bias, alongside documentation standards required for transparent and robust econometric research. The
Results section presents illustrative tables and conceptual figures, including a data source matrix, an integration map, a
proxy indicator catalogue, and a data quality scorecard. The paper concludes with practical recommendations for building
reliable, auditable, and replicable empirical databases in applied economic research.
References
Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion.
Princeton, NJ: Princeton University Press.
Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge
University Press.
Deaton, Angus. 1997. The Analysis of Household Surveys: A Microeconometric Approach to Development Policy.
Baltimore: Johns Hopkins University Press.
Eurostat. 2019. European Statistics Code of Practice. Luxembourg: Publications Office of the European Union.
Groves, Robert M., Floyd J. Fowler Jr., Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau.
Survey Methodology. 2nd ed. Hoboken, NJ: Wiley.
Gujarati, Damodar N., and Dawn C. Porter. 2009. Basic Econometrics. 5th ed. New York: McGraw-Hill.
Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences.
Cambridge: Cambridge University Press.
OECD. 2008. Handbook on Constructing Composite Indicators: Methodology and User Guide. Paris: OECD.
United Nations. 2014. Fundamental Principles of Official Statistics. New York: United Nations.
Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT
Press.