Related papers: Random Walk-steered Majority Undersampling
Class imbalance poses a significant challenge in machine learning (ML), often leading to biased models favouring the majority class. In this paper, we propose GAT-RWOS, a novel graph-based oversampling method that combines the strengths of…
This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. In this method, two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information…
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation…
Random walk sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as MHRW design weighted walking by…
Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited…
We present a novel quasi-Monte Carlo mechanism to improve graph-based sampling, coined repelling random walks. By inducing correlations between the trajectories of an interacting ensemble such that their marginal transition probabilities…
Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work…
Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. Among various graph sampling approaches, Traversal Based Sampling (TBS) are widely used due to low cost and feasibility for many cases, in which…
Imbalanced class distribution is a common problem in a number of fields including medical diagnostics, fraud detection, and others. It causes bias in classification algorithms leading to poor performance on the minority class data. In this…
Random walk based sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as GMD modify the topology of…
The Random Walks (RW) algorithm is one of the most e - cient and easy-to-use probabilistic segmentation methods. By combining contrast terms with prior terms, it provides accurate segmentations of medical images in a fully automated manner.…
Predicting links in complex networks has been one of the essential topics within the realm of data mining and science discovery over the past few years. This problem remains an attempt to identify future, deleted, and redundant links using…
The Random Walks (RW) algorithm is one of the most e - cient and easy-to-use probabilistic segmentation methods. By combining contrast terms with prior terms, it provides accurate segmentations of medical images in a fully automated manner.…
In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A…
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority…
Imbalanced data widely exists in many high-impact applications. An example is in air traffic control, where we aim to identify the leading indicators for each type of accident cause from historical records. Among all three types of accident…
A learning classifier must outperform a trivial solution, in case of imbalanced data, this condition usually does not hold true. To overcome this problem, we propose a novel data level resampling method - Clustering Based Oversampling for…
Graph embedding, representing local and global neighborhood information by numerical vectors, is a crucial part of the mathematical modeling of a wide range of real-world systems. Among the embedding algorithms, random walk-based algorithms…
Transductive graph-based semi-supervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their…
Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by…