Nowadays, we live in the “data era” where the use of statistical software or data analysis software is inevitable, in any research field. This means that the choice of the right software tool or platform is a strategic issue for a research department. Nevertheless, in many cases decision makers do not pay the right attention to a comprehensive and appropriate evaluation of what the market offers. Indeed, the choice still depends on few factors like, for instance, researcher's personal inclination, e.g., which software have been used at the university or is already known. This is not wrong in principle, but in some cases it's not enough at all and might lead to a “dead end” situation, typically after months or years of investments already done on the wrong software 1).
Data is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information.
Data in computing (or data processing) is represented in a structure that is often tabular (represented by rows and columns), a tree (a set of nodes with parent-children relationship), or a graph (a set of connected nodes). Data is typically the result of measurements and can be visualized using graphs or images.
Data as an abstract concept can be viewed as the lowest level of abstraction, from which information and then knowledge are derived.
Data-driven clinical results are important to accurately assess the effectiveness of procedures and technologies. These outcome measures are largely collected through randomized, controlled clinical trials or retrospective chart reviews with cumbersome data collection methods.
Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data.
see Big data.
see Clinical data.
see Data presentation.
Kubben P. Data Sources. 2018 Dec 22. In: Kubben P, Dumontier M, Dekker A, editors. Fundamentals of Clinical Data Science [Internet]. Cham (CH): Springer; 2019. Chapter 1. Available from http://www.ncbi.nlm.nih.gov/books/NBK543531/ PubMed PMID: 31314248 2).
Time series data of a variable have a set of observations on values at different points of time. They are usually collected at fixed intervals, such as daily, weekly, monthly, annually, quarterly, etc. Time series econometrics has applications in macroeconomics, but mainly in financial economics where it is used for price analysis of stocks, derivatives, currencies, etc.
Cross-section data are collected at the same point of time for several individuals. Examples are opinion polls, income distribution, data on GNP per capita in all European countries, etc.
Pooled data is a mixture of time series data and cross-section data. One example is GNP per capita of all European countries over ten years.
Panel, longitudinal or micropanel data is a type that is pooled data of nature. The difference is that we measure over the same cross-sectional unit for individuals, households, firms, etc. This branch of econometrics is called microeconometrics.