数据库 — Scifaro

Utility Mining Across Multi-Sequences with Individualized Thresholds

Utility-oriented pattern mining has become an emerging topic since it can reveal high-utility patterns (e.g., itemsets, rules, sequences) from different types of data, which provides more information than the traditional…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Jiexiong Zhang , Philip S. Yu

Utility-Driven Mining of Trend Information for Intelligent System

Useful knowledge, embedded in a database, is likely to change over time. Identifying recent changes in temporal databases can provide valuable up-to-date information to decision-makers. Nevertheless, techniques for mining high-utility…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Han-Chieh Chao , Philippe Fournier-Viger , Xuan Wang , Philip S. Yu

ProUM: Projection-based Utility Mining on Sequence Data

Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Jiexiong Zhang , Han-Chieh Chao , Hamido Fujita , Philip S. Yu

Utility-driven Data Analytics on Uncertain Data

Modern Internet of Things (IoT) applications generate massive amounts of data, much of it in the form of objects/items of readings, events, and log entries. Specifically, most of the objects in these IoT data contain rich embedded…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Han-Chieh Chao , Athanasios V. Vasilakos , Philip S. Yu

Beyond Frequency: Utility Mining with Varied Item-Specific Minimum Utility

Utility-oriented mining which integrates utility theory and data mining is a useful tool for understanding economic consumer behavior. Traditional algorithms for mining high-utility patterns (HUPs) applies a single/uniform minimum…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Philippe Fournier-Viger , Han-Chieh Chao , Philip S Yu

HUOPM: High Utility Occupancy Pattern Mining

Mining useful patterns from varied types of databases is an important research topic, which has many real-life applications. Most studies have considered the frequency as sole interestingness measure for identifying high quality patterns.…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Philippe Fournier-Viger , Han-Chieh Chao , Philip S. Yu

A Survey of Parallel Sequential Pattern Mining

With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Philippe Fournier-Viger , Han-Chieh Chao , Philip S. Yu

A Survey of Utility-Oriented Pattern Mining

The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of…

数据库 · 计算机科学 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Philippe Fournier-Viger , Han-Chieh Chao , Vincent S. Tseng , Philip S. Yu

Structural Generalizability: The Case of Similarity Search

Graph similarity search algorithms usually leverage the structural properties of a database. Hence, these algorithms are effective only on some structural variations of the data and are ineffective on other forms, which makes them hard to…

数据库 · 计算机科学 2021-04-01 Yodsawalai Chodpathumwan , Arash Termehchy , Stephen A. Ramsey , Aayam Shresta , Amy Glen , Zheng Liu

Explainable Fuzzy Utility Mining on Sequences

Fuzzy systems have good modeling capabilities in several data science scenarios, and can provide human-explainable intelligence models with explainability and interpretability. In contrast to transaction data, which have been extensively…

数据库 · 计算机科学 2021-03-31 Wensheng Gan , Zilin Du , Weiping Ding , Chunkai Zhang , Han-Chieh Chao

Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

Machine learning (ML) is now commonplace, powering data-driven applications in various organizations. Unlike the traditional perception of ML in research, ML production pipelines are complex, with many interlocking analytical components…

数据库 · 计算机科学 2021-03-31 Doris Xin , Hui Miao , Aditya Parameswaran , Neoklis Polyzotis

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing…

数据库 · 计算机科学 2021-03-31 Xi Liang , Stavros Sintos , Zechao Shang , Sanjay Krishnan

Some Results of Experimental Check of The Model of the Object Innovativeness Quantitative Evaluation

The paper presents the results of the experiments that were conducted to confirm the main ideas of the proposed approach to determining the objects innovativeness. This approach assumed that the product life cycle of whose descriptions are…

数据库 · 计算机科学 2021-03-31 V. K. Ivanov

Experimental check of model of object innovation evaluation

The article discusses the approach for evaluating the innovation index of the products and technologies. The evaluation results can be used to create a warehouse of the object descriptions with significant innovation potential. The model of…

数据库 · 计算机科学 2021-03-31 V. K. Ivanov

AugSplicing: Synchronized Behavior Detection in Streaming Tensors

How can we track synchronized behavior in a stream of time-stamped tuples, such as mobile devices installing and uninstalling applications in the lockstep, to boost their ranks in the app store? We model such tuples as entries in a…

数据库 · 计算机科学 2021-03-31 Jiabao Zhang , Shenghua Liu , Wenting Hou , Siddharth Bhatia , Huawei Shen , Wenjian Yu , Xueqi Cheng

Discovery data topology with the closure structure. Theoretical and practical aspects

In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a…

数据库 · 计算机科学 2021-03-31 Tatiana Makhalova , Aleksey Buzmakov , Sergei O. Kuznetsov , Amedeo Napoli

Putting Things into Context: Rich Explanations for Query Answers using Join Graphs (extended version)

In many data analysis applications, there is a need to explain why a surprising or interesting result was produced by a query. Previous approaches to explaining results have directly or indirectly used data provenance (input tuples…

数据库 · 计算机科学 2021-03-30 Chenjie Li , Zhengjie Miao , Qitian Zeng , Boris Glavic , Sudeepa Roy

Automatic Clustering in Hyrise

Physical data layout is an important performance factor for modern databases. Clustering, i.e., storing similar values in proximity, can lead to performance gains in several ways. We present an automated model to determine beneficial…

数据库 · 计算机科学 2021-03-30 Alexander Löser

Peculiarities of organization of data storage based on intelligent search agent and evolutionary model selection the target information

The article presents a systematic review of the results of the development of the theoretical basis and the pilot implementation of data storage technology with automatic replenishment of data from sources belonging to different thematic…

数据库 · 计算机科学 2021-03-30 V. K. Ivanov

MultiScope: Efficient Video Pre-processing for Exploratory Video Analytics

Performing analytics tasks over large-scale video datasets is increasingly common in a wide range of applications. These tasks generally involve object detection and tracking operations that require applying expensive machine learning…

数据库 · 计算机科学 2021-03-30 Favyen Bastani , Sam Madden