Related papers: HybridMiner: Mining Maximal Frequent Itemsets Usin…
Mining frequent itemset using bit-vector representation approach is very efficient for dense type datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a…
Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very…
Maximal frequent patterns superset checking plays an important role in the efficient mining of complete Maximal Frequent Itemsets (MFI) and maximal search space pruning. In this paper we present a new indexing approach, FastLMFI for local…
Knowledge discovery in databases aims at finding useful information, which can be deployed for decision making. The problem of high utility itemset mining has specifically garnered huge research focus in the past decade, as it aims to find…
Recommendation systems face the challenge of balancing accuracy and diversity, as traditional collaborative filtering (CF) and network-based diffusion algorithms exhibit complementary limitations. While item-based CF (ItemCF) enhances…
Many emerging use cases of data mining and machine learning operate on large datasets with data from heterogeneous sources, specifically with both sparse and dense components. For example, dense deep neural network embedding vectors are…
A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability…
Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not…
Feature selection is a process of choosing a subset of relevant features so that the quality of prediction models can be improved. An extensive body of work exists on information-theoretic feature selection, based on maximizing Mutual…
Diversity maximization aims to select a diverse and representative subset of items from a large dataset. It is a fundamental optimization task that finds applications in data summarization, feature selection, web search, recommender…
Data mining is the practice to search large amount of data to discover data patterns. Data mining uses mathematical algorithms to group the data and evaluate the future events. Association rule is a research area in the field of knowledge…
The development of novel platforms and techniques for emerging "Big Data" applications requires the availability of real-life datasets for data-driven experiments, which are however out of reach for academic research in most cases as they…
The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However,…
The proliferation of complex, multimodal datasets has exposed a critical gap between the capabilities of specialized vector databases and traditional graph databases. While vector databases excel at semantic similarity search, they lack the…
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels. Despite its advancements, the field grapples with challenges, notably the reliance…
High-utility itemset mining finds itemsets from a transaction database with utility no less than a fixed user-defined threshold. The utility of an itemset is defined as the sum of the utilities of its item. Several algorithms were proposed…
Frequent Pattern Mining is a one field of the most significant topics in data mining. In recent years, many algorithms have been proposed for mining frequent itemsets. A new algorithm has been presented for mining frequent itemsets based on…
Learning based hashing plays a pivotal role in large-scale visual search. However, most existing hashing algorithms tend to learn shallow models that do not seek representative binary codes. In this paper, we propose a novel hashing…
Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. There has been considerable research on generating efficient image representation via the deep-network-based hashing…
Mining frequent itemsets is at the core of mining association rules, and is by now quite well understood algorithmically. However, most algorithms for mining frequent itemsets assume that the main memory is large enough for the data…