Related papers: Scalable Sampling for High Utility Patterns

Visual Pattern-Driven Exploration of Big Data

Pattern extraction algorithms are enabling insights into the ever-growing amount of today's datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and…

Information Retrieval · Computer Science 2018-07-05 Michael Behrisch , Robert Krueger , Fritz Lekschas , Tobias Schreck , Nils Gehlenborg , Hanspeter Pfister

Fast Utility Mining on Complex Sequences

High-utility sequential pattern mining is an emerging topic in the field of Knowledge Discovery in Databases. It consists of discovering subsequences having a high utility (importance) in sequences, referred to as high-utility sequential…

Databases · Computer Science 2019-04-30 Wensheng Gan , Jerry Chun-Wei Lin , Jiexiong Zhang , Philippe Fournier-Viger , Han-Chieh Chao , Philip S. Yu

STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data

Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are…

Databases · Computer Science 2020-09-01 Guizhen Wang , Jingjing Guo , Mingjie Tang , José Florencio de Queiroz Neto , Calvin Yau , Anas Daghistani , Morteza Karimzadeh , Walid G. Aref , David S. Ebert

Towards an Efficient Discovery of the Topological Representative Subgraphs

With the emergence of graph databases, the task of frequent subgraph discovery has been extensively addressed. Although the proposed approaches in the literature have made this task feasible, the number of discovered frequent subgraphs is…

Databases · Computer Science 2013-08-16 Wajdi Dhifli , Mohamed Moussaoui , Rabie Saidi , Engelbert Mephu Nguifo

Consistent and Flexible Selectivity Estimation for High-Dimensional Data

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…

Databases · Computer Science 2021-05-28 Yaoshu Wang , Chuan Xiao , Jianbin Qin , Rui Mao , Onizuka Makoto , Wei Wang , Rui Zhang , Yoshiharu Ishikawa

Robust and Scalable Entity Alignment in Big Data

Entity alignment has always had significant uses within a multitude of diverse scientific fields. In particular, the concept of matching entities across networks has grown in significance in the world of social science as communicative…

Social and Information Networks · Computer Science 2020-04-21 James Flamino , Christopher Abriola , Ben Zimmerman , Zhongheng Li , Joel Douglas

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Zihan Wu , Zhaoke Huang , Hong Yan

Predictive Subsampling for Scalable Inference in Networks

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Network Sampling Based on NN Representatives

The amount of large-scale real data around us increase in size very quickly and so does the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, whose direct…

Social and Information Networks · Computer Science 2014-02-10 Milos Kudelka , Sarka Zehnalova , Jan Platos

HUOPM: High Utility Occupancy Pattern Mining

Mining useful patterns from varied types of databases is an important research topic, which has many real-life applications. Most studies have considered the frequency as sole interestingness measure for identifying high quality patterns.…

Databases · Computer Science 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Philippe Fournier-Viger , Han-Chieh Chao , Philip S. Yu

Correlated Utility-based Pattern Mining

In the field of data mining and analytics, the utility theory from Economic can bring benefits in many real-life applications. In recent decade, a new research field called utility-oriented mining has already attracted great attention.…

Databases · Computer Science 2019-09-13 Wensheng Gan , Jerry Chun-Wei Lin , Han-Chieh Chao , Hamido Fujita , Philip S. Yu

Pattern Sampling for Shapelet-based Time Series Classification

Subsequence-based time series classification algorithms provide accurate and interpretable models, but training these models is extremely computation intensive. The asymptotic time complexity of subsequence-based algorithms remains a…

Machine Learning · Computer Science 2021-02-18 Atif Raza , Stefan Kramer

Effective Sampling: Fast Segmentation Using Robust Geometric Model Fitting

Identifying the underlying models in a set of data points contaminated by noise and outliers, leads to a highly complex multi-model fitting problem. This problem can be posed as a clustering problem by the projection of higher order…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Ruwan Tennakoon , Alireza Sadri , Reza Hoseinnezhad , Alireza Bab-Hadiashar

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor…

Databases · Computer Science 2019-06-19 Shohedul Hasan , Saravanan Thirumuruganathan , Jees Augustine , Nick Koudas , Gautam Das

Space-Efficient Sampling from Social Activity Streams

In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large…

Social and Information Networks · Computer Science 2012-06-22 Nesreen K. Ahmed , Jennifer Neville , Ramana Kompella

A Survey of Utility-Oriented Pattern Mining

The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of…

Databases · Computer Science 2021-04-01 Wensheng Gan , Jerry Chun-Wei Lin , Philippe Fournier-Viger , Han-Chieh Chao , Vincent S. Tseng , Philip S. Yu

Towards Sequence Utility Maximization under Utility Occupancy Measure

The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the…

Databases · Computer Science 2022-12-21 Gengsen Huang , Wensheng Gan , Philip S. Yu

Towards Target High-Utility Itemsets

For applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases. However, in these techniques for pattern discovery, the number of patterns can be huge, and the user is often…

Databases · Computer Science 2022-06-14 Jinbao Miao , Wensheng Gan , Shicheng Wan , Yongdong Wu , Philippe Fournier-Viger

A model robust sub-sampling approach for Generalised Linear Models in Big data settings

In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as…

Methodology · Statistics 2022-09-07 Amalan Mahendran , Helen Thompson , James M. McGree

Extended High Utility Pattern Mining: An Answer Set Programming Based Framework and Applications

Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different…

Artificial Intelligence · Computer Science 2023-03-24 Francesco Cauteruccio , Giorgio Terracina