English
Related papers

Related papers: An Efficient Data Analysis Method for Big Data usi…

200 papers

Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However,…

Machine Learning · Computer Science 2020-08-11 Meng Wang , Weijie Fu , Xiangnan He , Shijie Hao , Xindong Wu

We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive…

Methodology · Statistics 2021-11-02 Ali Eshragh , Fred Roosta , Asef Nazari , Michael W. Mahoney

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

D&R is a statistical approach designed to handle large and complex datasets. It partitions the dataset into several manageable subsets and subsequently applies the analytic method to each subset independently to obtain results. Finally, the…

Methodology · Statistics 2024-12-12 Md. Mahadi Hassan Nayem , Soma Chowdhury Biswas

We propose a fast and efficient strategy, called the representative approach, for big data analysis with generalized linear models, especially for distributed data with localization requirements or limited network bandwidth. With a given…

Methodology · Statistics 2021-12-16 Keren Li , Jie Yang

Datasets of real-world applications are characterized by entities of different types, which are defined by multiple features and connected via varied types of relationships. A critical challenge for these datasets is developing models and…

Social and Information Networks · Computer Science 2019-09-24 Abhishek Santra , Kanthi Sannappa Komar , Sanjukta Bhowmick , Sharma Chakravarthy

Dynamic linear models (DLM) offer a very generic framework to analyse time series data. Many classical time series models can be formulated as DLMs, including ARMA models and standard multiple linear regression models. The models can be…

Methodology · Statistics 2019-08-20 Marko Laine

The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of…

Databases · Computer Science 2011-08-30 Abhishek Taneja , R. K. Chauhan

To improve accuracy and speed of regressions and classifications, we present a data-based prediction method, Random Bits Regression (RBR). This method first generates a large number of random binary intermediate/derived features based on…

Machine Learning · Statistics 2016-11-04 Yi Wang , Yi Li , Momiao Xiong , Li Jin

The missing data problem has been broadly studied in the last few decades and has various applications in different areas such as statistics or bioinformatics. Even though many methods have been developed to tackle this challenge, most of…

Machine Learning · Statistics 2021-06-10 Thu Nguyen , Khoi Minh Nguyen-Duy , Duy Ho Minh Nguyen , Binh T. Nguyen , Bruce Alan Wade

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We…

Methodology · Statistics 2016-04-20 Shahab Basiri , Esa Ollila , Visa Koivunen

This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into classification of High Dimension Low Sample Size (HDLSS) data, in particular microarray data, has made considerable…

Methodology · Statistics 2008-07-28 Inge Koch , Kanta Naito

A full parametric and linear specification may be insufficient to capture complicated patterns in studies exploring complex features, such as those investigating age-related changes in brain functional abilities. Alternatively, a partially…

Methodology · Statistics 2024-02-07 Jia Liang , Shuo Chen , Peter Kochunov , L Elliot Hong , Chixiang Chen

The generalised linear model (GLM) is a very important tool for analysing real data in biology, sociology, agriculture, engineering and many other application domain where the relationship between the response and explanatory variables may…

Methodology · Statistics 2016-07-04 Abhik Ghosh , Ayanendranath Basu

The problems of computational data processing involving regression, interpolation, reconstruction and imputation for multidimensional big datasets are becoming more important these days, because of the availability of data and their widely…

Methodology · Statistics 2017-03-22 Yuri K. Shestopaloff , Alexander Y. Shestopaloff

This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion and an unknown covariance structure of the random…

Methodology · Statistics 2020-11-10 Linjun Zhang , Rong Ma , T. Tony Cai , Hongzhe Li

This paper introduces a new type of regression methodology named as Convex-Area-Wise Linear Regression(CALR), which separates given datasets by disjoint convex areas and fits different linear regression models for different areas. This…

Databases · Computer Science 2024-06-11 Bohan Lyu , Jianzhong Li

Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are…

Methodology · Statistics 2021-06-11 Darren Homrighausen , Daniel J. McDonald

Time series data plays a critical role across diverse domains such as healthcare, energy, and finance, where tasks like classification, anomaly detection, and forecasting are essential for informed decision-making. Recently, large language…

Machine Learning · Computer Science 2024-12-18 Francis Tang , Ying Ding

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao
‹ Prev 1 2 3 10 Next ›