English
Related papers

Related papers: Oseba: Optimization for Selective Bulk Analysis in…

200 papers

Management of disk scheduling is a very important aspect of operating system. Performance of the disk scheduling completely depends on how efficient is the scheduling algorithm to allocate services to the request in a better manner. Many…

Operating Systems · Computer Science 2014-03-04 Sourav Kumar Bhoi , Sanjaya Kumar Panda , Imran Hossain Faruk

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu

Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called ``slow kill,'' which utilizes nonconvex…

Machine Learning · Statistics 2023-05-04 Yiyuan She , Jianhui Shen , Adrian Barbu

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

In the current data-intensive era, big data has become a significant asset for Artificial Intelligence (AI), serving as a foundation for developing data-driven models and providing insight into various unknown fields. This study navigates…

Machine Learning · Computer Science 2024-07-04 Daniel Menges , Adil Rasheed

Data processing frameworks such as Apache Beam and Apache Spark are used for a wide range of applications, from logs analysis to data preparation for DNN training. It is thus unsurprising that there has been a large amount of work on…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-07 Ubaid Ullah Hafeez , Martin Maas , Mustafa Uysal , Richard McDougall

Sampling is a basic operation in many inference-time algorithms of large language models (LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation for sample compute budgets: Which…

Computation and Language · Computer Science 2024-10-31 Kexun Zhang , Shang Zhou , Danqing Wang , William Yang Wang , Lei Li

With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-17 Shanjiang Tang , Bingsheng He , Ce Yu , Yusen Li , Kun Li

This paper optimizes the configuration of large-scale data centers toward cost-effective, reliable and sustainable cloud supply chains. The problem involves placing incoming racks of servers within a data center to maximize demand coverage…

Optimization and Control · Mathematics 2026-01-19 Saumil Baxi , Kayla Cummings , Alexandre Jacquillat , Sean Lo , Rob McDonald , Konstantina Mellou , Ishai Menache , Marco Molinaro

We present data-oblivious algorithms in the external-memory model for compaction, selection, and sorting. Motivation for such problems comes from clients who use outsourced data storage services and wish to mask their data access patterns.…

Data Structures and Algorithms · Computer Science 2011-03-29 Michael T. Goodrich

The increasing capabilities of machine learning models, such as vision-language and multimodal language models, are placing growing demands on data in automotive systems engineering, making the quality and relevance of collected data…

Systems and Control · Electrical Eng. & Systems 2026-04-01 Philipp Reis , Jacqueline Henle , Stefan Otten , Eric Sax

Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-06 Jonathan Will , Lauritz Thamsen , Jonathan Bader , Dominik Scheinert , Odej Kao

As data volumes grow across applications, analytics of large amounts of data is becoming increasingly important. Big data processing frameworks such as Apache Hadoop, Apache AsterixDB, and Apache Spark have been built to meet this demand. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-15 Avinash Kumar

Cluster analysis plays an important role in decision making process for many knowledge-based systems. There exist a wide variety of different approaches for clustering applications including the heuristic techniques, probabilistic models,…

Artificial Intelligence · Computer Science 2017-03-09 Kayvan Bijari , Hadi Zare , Hadi Veisi , Hossein Bobarshad

Big array analytics is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating…

Databases · Computer Science 2012-04-30 Yi Zhang , Jun Yang

In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and…

Databases · Computer Science 2016-11-01 Yiming Lin , Hongzhi Wang , Jianzhong Li , Hong Gao

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, high…

Machine Learning · Computer Science 2015-07-28 Dayong Wang , Pengcheng Wu , Peilin Zhao , Steven C. H. Hoi

Sparse decision trees are one of the most common forms of interpretable models. While recent advances have produced algorithms that fully optimize sparse decision trees for prediction, that work does not address policy design, because the…

Machine Learning · Computer Science 2022-10-27 Ali Behrouz , Mathias Lecuyer , Cynthia Rudin , Margo Seltzer

Big data applications have fast arriving data that must be quickly ingested. At the same time, they have specific needs to preprocess and transform the data before it could be put to use. The current practice is to do these preparatory…

Databases · Computer Science 2017-01-24 Alekh Jindal , Jorge-Arnulfo Quiane-Ruiz , Samuel Madden

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We…

Methodology · Statistics 2016-04-20 Shahab Basiri , Esa Ollila , Visa Koivunen
‹ Prev 1 2 3 10 Next ›