English

Variable selection in functional data classification: a maxima-hunting proposal

Methodology 2016-08-09 v3

Abstract

Variable selection is considered in the setting of supervised binary classification with functional data {X(t), t[0,1]}\{X(t),\ t\in[0,1]\}. By "variable selection" we mean any dimension-reduction method which leads to replace the whole trajectory {X(t), t[0,1]}\{X(t),\ t\in[0,1]\}, with a low-dimensional vector (X(t1),,X(tk))(X(t_1),\ldots,X(t_k)) still keeping a similar classification error. Our proposal for variable selection is based on the idea of selecting the local maxima (t1,,tk)(t_1,\ldots,t_k) of the function VX2(t)=V2(X(t),Y){\mathcal V}_X^2(t)={\mathcal V}^2(X(t),Y), where V{\mathcal V} denotes the "distance covariance" association measure for random variables due to Sz\'ekely, Rizzo and Bakirov (2007). This method provides a simple natural way to deal with the relevance vs. redundancy trade-off which typically appears in variable selection. This paper includes (a) Some theoretical motivation: a result of consistent estimation on the maxima of VX2{\mathcal V}_X^2 is shown. We also show different theoretical models for the underlying process X(t)X(t) under which the relevant information in concentrated in the maxima of VX2{\mathcal V}_X^2. (b) An extensive empirical study, including about 400 simulated models and real data examples, aimed at comparing our variable selection method with other standard proposals for dimension reduction.

Keywords

Cite

@article{arxiv.1309.6697,
  title  = {Variable selection in functional data classification: a maxima-hunting proposal},
  author = {José R. Berrendero and Antonio Cuevas and José L. Torrecilla},
  journal= {arXiv preprint arXiv:1309.6697},
  year   = {2016}
}
R2 v1 2026-06-22T01:34:13.422Z