Dictionary-Learning-Based Data Pruning for System Identification

Tingna Wang; Sikai Zhang; Mingming Song; Limin Sun

doi:10.3390/app15179368

Dictionary-Learning-Based Data Pruning for System Identification

Machine Learning 2025-09-05 v2 Systems and Control Systems and Control

Authors: Tingna Wang , Sikai Zhang , Mingming Song , Limin Sun

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.

Keywords

time series classification association rule mining randomized algorithm

Cite

@article{arxiv.2502.11484,
  title  = {Dictionary-Learning-Based Data Pruning for System Identification},
  author = {Tingna Wang and Sikai Zhang and Mingming Song and Limin Sun},
  journal= {arXiv preprint arXiv:2502.11484},
  year   = {2025}
}

Dictionary-Learning-Based Data Pruning for System Identification

Abstract

Keywords

Cite

Related papers