Related papers: Continuously Updated Data Analysis Systems

CDAS: A Crowdsourcing Data Analytics System

Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on…

Databases · Computer Science 2012-07-03 Xuan Liu , Meiyu Lu , Beng Chin Ooi , Yanyan Shen , Sai Wu , Meihui Zhang

DFS: A Dataset File System for Data Discovering Users

Many research questions can be answered quickly and efficiently using data already collected for previous research. This practice is called secondary data analysis (SDA), and has gained popularity due to lower costs and improved research…

Digital Libraries · Computer Science 2020-04-07 Yasith Jayawardana , Sampath Jayarathna

CuTS: Customizable Tabular Synthetic Data Generation

Privacy, data quality, and data sharing concerns pose a key limitation for tabular data applications. While generating synthetic data resembling the original distribution addresses some of these issues, most applications would benefit from…

Machine Learning · Computer Science 2024-06-04 Mark Vero , Mislav Balunović , Martin Vechev

Augmented Data Science: Towards Industrialization and Democratization of Data Science

Conversion of raw data into insights and knowledge requires substantial amounts of effort from data scientists. Despite breathtaking advances in Machine Learning (ML) and Artificial Intelligence (AI), data scientists still spend the…

Artificial Intelligence · Computer Science 2019-09-13 Huseyin Uzunalioglu , Jin Cao , Chitra Phadke , Gerald Lehmann , Ahmet Akyamac , Ran He , Jeongran Lee , Maria Able

Uncovering Data Across Continua: An Introduction to Functional Data Analysis

In a world increasingly awash with data, the need to extract meaningful insights from data has never been more crucial. Functional Data Analysis (FDA) goes beyond traditional data points, treating data as dynamic, continuous functions,…

Statistics Theory · Mathematics 2024-04-26 Sophie Dabo-Niang , Camille Frévent

Continuous Analysis: Evolution of Software Engineering and Reproducibility for Science

Reproducibility in research remains hindered by complex systems involving data, models, tools, and algorithms. Studies highlight a reproducibility crisis due to a lack of standardized reporting, code and data sharing, and rigorous…

Software Engineering · Computer Science 2024-11-05 Venkat S. Malladi , Maria Yazykova , Olesya Melnichenko , Yulia Dubinina

ICABiDAS: Intuition Centred Architecture for Big Data Analysis and Synthesis

Humans are expert in the amount of sensory data they deal with each moment. Human brain not only analyses these data but also starts synthesizing new information from the existing data. The current age Big-data systems are needed not just…

Artificial Intelligence · Computer Science 2022-08-26 Amit Kumar Mishra

In Defense of Synthetic Data

Synthetic datasets have long been thought of as second-rate, to be used only when "real" data collected directly from the real world is unavailable. But this perspective assumes that raw data is clean, unbiased, and trustworthy, which it…

Databases · Computer Science 2019-05-07 Luke Rodriguez , Bill Howe

AutoDS: Towards Human-Centered Automation of Data Science

Data science (DS) projects often follow a lifecycle that consists of laborious tasks for data scientists and domain experts (e.g., data exploration, model training, etc.). Only till recently, machine learning(ML) researchers have developed…

Human-Computer Interaction · Computer Science 2021-01-15 Dakuo Wang , Josh Andres , Justin Weisz , Erick Oduor , Casey Dugan

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on…

Image and Video Processing · Electrical Eng. & Systems 2026-05-15 Yixuan Jia , Siyi Chen , Yida Pan , Xiao Li , Lianghe Shi , Chanyong Jung , Haijie Yuan , Ismail Alkhouri , Yue Cynthia Wu , Saiprasad Ravishankar , Jeffrey A Fessler , Qing Qu

Forming IDEAS Interactive Data Exploration & Analysis System

Modern cyber security operations collect an enormous amount of logging and alerting data. While analysts have the ability to query and compute simple statistics and plots from their data, current analytical tools are too simple to admit…

Cryptography and Security · Computer Science 2018-06-22 Robert A. Bridges , Maria A. Vincent , Kelly M. T. Huffer , John R. Goodall , Jessie D. Jamieson , Zachary Burch

Towards an Integrated Platform for Big Data Analysis

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage,…

Databases · Computer Science 2020-04-29 Mahdi Bohlouli , Frank Schulz , Lefteris Angelis , David Pahor , Ivona Brandic , David Atlan , Rosemary Tate

The CAVES Project - Exploring Virtual Data Concepts for Data Analysis

The Collaborative Analysis Versioning Environment System (CAVES) project concentrates on the interactions between users performing data and/or computing intensive analyses on large data sets, as encountered in many contemporary scientific…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Dimitri Bourilkov

RuDaS: Synthetic Datasets for Rule Learning and Evaluation Tools

Logical rules are a popular knowledge representation language in many domains, representing background knowledge and encoding information that can be derived from given facts in a compact form. However, rule formulation is a complex process…

Artificial Intelligence · Computer Science 2020-02-13 Cristina Cornelio , Veronika Thost

How can AI Automate End-to-End Data Science?

Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and…

Artificial Intelligence · Computer Science 2019-11-01 Charu Aggarwal , Djallel Bouneffouf , Horst Samulowitz , Beat Buesser , Thanh Hoang , Udayan Khurana , Sijia Liu , Tejaswini Pedapati , Parikshit Ram , Ambrish Rawat , Martin Wistuba , Alexander Gray

Overview of the COMPETE Program

Nowadays, scientific databases have become the bread-and-butter of particle physicists. These databases must be maintained and checked repeatedly to insure the accuracy of their content. The COMPETE collaboration aims at motivating data…

High Energy Physics - Phenomenology · Physics 2007-05-23 V. V. Ezhela , J. R. Cudell , P. Gauron , K. Kang , S. K. Kang , Yu. V. Kuyanov , A. Lengyel , K. S. Lugovsky , S. B. Lugovsky , V. S. Lugovsky , E. Martynov , B. Nicolescu , E. A. Razuvaev , M. Yu. Sapunov , O. Selyugin , N. P. Tkachenko , M. R. Whalley , O. V. Zenin

Final Report for CHESS: Cloud, High-Performance Computing, and Edge for Science and Security

Automating the theory-experiment cycle requires effective distributed workflows that utilize a computing continuum spanning lab instruments, edge sensors, computing resources at multiple facilities, data sets distributed across multiple…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-22 Nathan Tallent , Jan Strube , Luanzheng Guo , Hyungro Lee , Jesun Firoz , Sayan Ghosh , Bo Fang , Oceane Bel , Steven Spurgeon , Sarah Akers , Christina Doty , Erol Cromwell

DATeS: A Highly-Extensible Data Assimilation Testing Suite v1.0

A flexible and highly-extensible data assimilation testing suite, named DATeS, is described in this paper. DATeS aims to offer a unified testing environment that allows researchers to compare different data assimilation methodologies and…

Mathematical Software · Computer Science 2018-07-03 Ahmed Attia , Adrian Sandu

Continual-Learning-as-a-Service (CLaaS): On-Demand Efficient Adaptation of Predictive Models

Predictive machine learning models nowadays are often updated in a stateless and expensive way. The two main future trends for companies that want to build machine learning-based applications and systems are real-time inference and…

Machine Learning · Computer Science 2022-07-22 Rudy Semola , Vincenzo Lomonaco , Davide Bacciu

An Intelligent Innovation Dataset on Scientific Research Outcomes

Various stakeholders, such as researchers, government agencies, businesses, and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support their work. These…

Databases · Computer Science 2024-10-01 Xinran Wu , Hui Zou , Yidan Xing , Jingjing Qu , Qiongxiu Li , Renxia Xue , Xiaoming Fu