Related papers: An Efficient Data Analysis Method for Big Data usi…

A Survey on Large-scale Machine Learning

Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However,…

Machine Learning · Computer Science 2020-08-11 Meng Wang , Weijie Fu , Xiangnan He , Shijie Hao , Xindong Wu

LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data

We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive…

Methodology · Statistics 2021-11-02 Ali Eshragh , Fred Roosta , Asef Nazari , Michael W. Mahoney

Subdata selection for big data regression: an improved approach

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

Application of generalized linear models in big data: a divide and recombine (D&R) approach

D&R is a statistical approach designed to handle large and complex datasets. It partitions the dataset into several manageable subsets and subsequently applies the analytic method to each subset independently to obtain results. Finally, the…

Methodology · Statistics 2024-12-12 Md. Mahadi Hassan Nayem , Soma Chowdhury Biswas

Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models

We propose a fast and efficient strategy, called the representative approach, for big data analysis with generalized linear models, especially for distributed data with localization requirements or limited network bandwidth. With a given…

Methodology · Statistics 2021-12-16 Keren Li , Jie Yang

Making a Case for MLNs for Data-Driven Analysis: Modeling, Efficiency, and Versatility

Datasets of real-world applications are characterized by entities of different types, which are defined by multiple features and connected via varied types of relationships. A critical challenge for these datasets is developing models and…

Social and Information Networks · Computer Science 2019-09-24 Abhishek Santra , Kanthi Sannappa Komar , Sanjukta Bhowmick , Sharma Chakravarthy

Introduction to Dynamic Linear Models for Time Series Analysis

Dynamic linear models (DLM) offer a very generic framework to analyse time series data. Many classical time series models can be formulated as DLMs, including ARMA models and standard multiple linear regression models. The models can be…

Methodology · Statistics 2019-08-20 Marko Laine

A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of…

Databases · Computer Science 2011-08-30 Abhishek Taneja , R. K. Chauhan

Random Bits Regression: a Strong General Predictor for Big Data

To improve accuracy and speed of regressions and classifications, we present a data-based prediction method, Random Bits Regression (RBR). This method first generates a large number of random binary intermediate/derived features based on…

Machine Learning · Statistics 2016-11-04 Yi Wang , Yi Li , Momiao Xiong , Li Jin

DPER: Efficient Parameter Estimation for Randomly Missing Data

The missing data problem has been broadly studied in the last few decades and has various applications in different areas such as statistics or bioinformatics. Even though many methods have been developed to tackle this challenge, most of…

Machine Learning · Statistics 2021-06-10 Thu Nguyen , Khoi Minh Nguyen-Duy , Duy Ho Minh Nguyen , Binh T. Nguyen , Bruce Alan Wade

Robust, scalable and fast bootstrap method for analyzing large scale data

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We…

Methodology · Statistics 2016-04-20 Shahab Basiri , Esa Ollila , Visa Koivunen

Prediction of multivariate responses with a select number of principal components

This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into classification of High Dimension Low Sample Size (HDLSS) data, in particular microarray data, has made considerable…

Methodology · Statistics 2008-07-28 Inge Koch , Kanta Naito

Integrative data analysis where partial covariates have complex non-linear effects by using summary information from an external data

A full parametric and linear specification may be insufficient to capture complicated patterns in studies exploring complex features, such as those investigating age-related changes in brain functional abilities. Alternatively, a partially…

Methodology · Statistics 2024-02-07 Jia Liang , Shuo Chen , Peter Kochunov , L Elliot Hong , Chixiang Chen

Robust Estimation in Generalised Linear Models : The Density Power Divergence Approach

The generalised linear model (GLM) is a very important tool for analysing real data in biology, sociology, agriculture, engineering and many other application domain where the relationship between the response and explanatory variables may…

Methodology · Statistics 2016-07-04 Abhik Ghosh , Ayanendranath Basu

New reconstruction and data processing methods for regression and interpolation analysis of multidimensional big data

The problems of computational data processing involving regression, interpolation, reconstruction and imputation for multidimensional big datasets are becoming more important these days, because of the availability of data and their widely…

Methodology · Statistics 2017-03-22 Yuri K. Shestopaloff , Alexander Y. Shestopaloff

Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for High-Dimensional Mixed Linear Regression

This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion and an unknown covariance structure of the random…

Methodology · Statistics 2020-11-10 Linjun Zhang , Rong Ma , T. Tony Cai , Hongzhe Li

Convex-area-wise Linear Regression and Algorithms for Data Analysis

This paper introduces a new type of regression methodology named as Convex-Area-Wise Linear Regression(CALR), which separates given datasets by disjoint convex areas and fits different linear regression models for different areas. This…

Databases · Computer Science 2024-06-11 Bohan Lyu , Jianzhong Li

Compressed and Penalized Linear Regression

Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are…

Methodology · Statistics 2021-06-11 Darren Homrighausen , Daniel J. McDonald

Are Large Language Models Useful for Time Series Data Analysis?

Time series data plays a critical role across diverse domains such as healthcare, energy, and finance, where tasks like classification, anomaly detection, and forecasting are essential for informed decision-making. Recently, large language…

Machine Learning · Computer Science 2024-12-18 Francis Tang , Ying Ding

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao