Related papers: Technical Report: Partial Dependence through Strat…
In practical applications of machine learning, it is necessary to look beyond standard metrics such as test accuracy in order to validate various qualitative properties of a model. Partial dependence plots (PDP), including instance-specific…
Many existing interpretation methods are based on Partial Dependence (PD) functions that, for a pre-trained machine learning model, capture how a subset of the features affects the predictions by averaging over the remaining features.…
Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity.…
Learning processes by exploiting restricted domain knowledge is an important task across a plethora of scientific areas, with more and more hybrid training methods additively combining data-driven and model-based approaches. Although the…
Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency…
The advent of big data has raised significant challenges in analysing high-dimensional datasets across various domains such as medicine, ecology, and economics. Functional Data Analysis (FDA) has proven to be a robust framework for…
Predictive models often degrade in performance due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is…
This study introduces a novel forecasting strategy that leverages the power of fractional differencing (FD) to capture both short- and long-term dependencies in time series data. Unlike traditional integer differencing methods, FD preserves…
In this paper, we revisit the notion of partial copula, originally introduced to test conditional independence, highlighting its capability to represent the dependence between two random variables after removing their dependence with a…
Explaining artificial intelligence or machine learning models is increasingly important. To use such data-driven systems wisely we must understand how they interact with the world, including how they depend causally on data inputs. In this…
This paper studies linear reconstruction of partially observed functional data which are recorded on a discrete grid. We propose a novel estimation approach based on approximate factor models with increasing rank taking into account…
One of the most popular approaches to understanding feature effects of modern black box machine learning models are partial dependence plots (PDP). These plots are easy to understand but only able to visualize low order dependencies. The…
This study develops a numerical scheme for path-dependent FBSDEs and PDEs. We introduce a Picard iteration method for solving path-dependent FBSDEs, prove its convergence to the true solution, and establish its rate of convergence. A key…
Fractional calculus provides a rigorous mathematical framework to describe anomalous stochastic processes by generalizing the notion of classical differential equations to their fractional-order counterparts. By introducing the fractional…
We study semiparametric varying-coefficient partially linear models when some linear covariates are not observed, but ancillary variables are available. Semiparametric profile least-square based estimation procedures are developed for…
The regression discontinuity design (RDD) is a quasi-experimental design that can be used to identify and estimate the causal effect of a treatment using observational data. In an RDD, a pre-specified rule is used for treatment assignment,…
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any…
A challenge for data imputation is the lack of knowledge. In this paper, we attempt to address this challenge by involving extra knowledge from web. To achieve high-performance web-based imputation, we use the dependency, i.e.FDs and CFDs,…
Approximate functional dependencies (AFDs) are functional dependencies (FDs) that "almost" hold in a relation. While various measures have been proposed to quantify the level to which an FD holds approximately, they are difficult to compare…
Multiple testing is a fundamental problem in high-dimensional statistical inference. Although many methods have been proposed to control false discoveries, it is still a challenging task when the tests are correlated to each other. To…