Related papers: A few statistical principles for data science
Data Science is a complex and evolving field, but most agree that it can be defined as a combination of expertise drawn from three broad areascomputer science and technology, math and statistics, and domain knowledge -- with the purpose of…
A population quantity of interest in statistical shape analysis is the location of landmarks, which are points that aid in reconstructing and representing shapes of objects. We provide an automated, model-based approach to inferring…
Decision-makers abhor uncertainty, and it is certainly true that the less there is of it the better. However, recognizing that uncertainty is part of the equation, particularly for deciding on environmental policy, is a prerequisite for…
What is Statistics? Opinions vary. In fact, there is a continuous spectrum of attitudes toward statistics ranging from pure theoreticians, proving asymptotic efficiency and searching for most powerful tests, to wild practitioners, blindly…
The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be…
A critical step in data analysis for many different types of experiments is the identification of features with theoretically defined shapes in N-dimensional datasets; examples of this process include finding peaks in multi-dimensional…
Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the…
This article provides an overview on the statistical modeling of complex data as increasingly encountered in modern data analysis. It is argued that such data can often be described as elements of a metric space that satisfies certain…
In computational materials science, mechanical properties are typically extracted from simulations by means of analysis routines that seek to mimic their experimental counterparts. However, simulated data often exhibit uncertainties that…
Nowadays it is inevitable to use intelligent systems to improve the performance and optimization of different components of devices or factories. Furthermore, it's so essential to have appropriate predictions to make better decisions in…
Although data science builds on knowledge from computer science, mathematics, statistics, and other disciplines, data science is a unique field with many mysteries to unlock: challenging scientific questions and pressing questions of…
Data visualization is a core part of statistical practice and is ubiquitous in many fields. Although there are numerous books on data visualization, instructors in statistics and data science may be unsure how to teach data visualization,…
Spatial statistics is an area of study devoted to the statistical analysis of data that have a spatial label associated with them. Geographers often refer to the "location information" associated with the "attribute information," whose…
The gap in statistics between multi-variate and time-series analysis can be bridged by using entropy statistics and recent developments in multi-dimensional scaling. For explaining the evolution of the sciences as non-linear dynamics, the…
Advances in architectural design, data availability, and compute have driven remarkable progress in semantic segmentation. Yet, these models often rely on relaxed Bayesian assumptions, omitting critical uncertainty information needed for…
Causal inference from observational data is the goal of many data analyses in the health and social sciences. However, academic statistics has often frowned upon data analyses with a causal objective. The introduction of the term "data…
Measuring science is based on comparing articles to similar others. However, keyword-based groups of thematically similar articles are dominantly small. These small sizes keep the statistical errors of comparisons high. With the growing…
While the methodological rigor of computing research has improved considerably in the past two decades, quantitative software engineering research is hampered by immature measures and inattention to theory. Measurement-the principled…
Economies are fundamentally complex and becoming more so, but the new discipline of data science-which combines programming, statistics, and domain knowledge-can help cut through that complexity, potentially with productivity benefits to…
Statistics is sometimes described as the science of reasoning under uncertainty. Statistical models provide one view of this uncertainty, but what is frequently neglected is the 'invisible' portion of uncertainty: that assumed not to exist…