English
Related papers

Related papers: Vector-based categorization analysis with improved…

200 papers

A simple and fast analysis method to sort large data sets into groups with shared distinguishing characteristics is described, and applied to single molecular break junction conductance versus electrode displacement data. The method, based…

Mesoscale and Nanoscale Physics · Physics 2018-01-10 J. M. Hamill , X. T. Zhao , G. Mészáros , M. R. Bryce , M. Arenz

Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final…

Machine Learning · Statistics 2023-08-14 Luca Coraggio , Pietro Coretto

Scientists study trajectory data to understand trends in movement patterns, such as human mobility for traffic analysis and urban planning. There is a pressing need for scalable and efficient techniques for analyzing this data and…

Machine Learning · Computer Science 2012-09-03 Nivan Ferreira , James T. Klosowski , Carlos Scheidegger , Claudio Silva

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and…

Databases · Computer Science 2022-12-13 Leonardo Alexandre , Rafael S. Costa , Rui Henriques

Vector data is prevalent across business and scientific applications, and its popularity is growing with the proliferation of learned embeddings. Vector data collections often reach billions of vectors with thousands of dimensions, thus,…

Information Retrieval · Computer Science 2025-09-09 Ilias Azizi , Karima Echihab , Themis Palpanas , Vassilis Christophides

Vector data is prevalent across business and scientific applications, and its popularity is growing with the proliferation of learned embeddings. Vector data collections often reach billions of vectors with thousands of dimensions, thus,…

Information Retrieval · Computer Science 2025-09-08 Ilias Azizi , Karima Echihabi , Themis Palpanas

Among the components contributing to particle transport, geometry navigation is an important consumer of CPU cycles. The tasks performed to get answers to "basic" queries such as locating a point within a geometry hierarchy or computing…

Computational Physics · Physics 2013-12-04 John Apostolakis , René Brun , Federico Carminati , Andrei Gheata , Sandro Wenzel

Motivation: Although principal component analysis is frequently applied to reduce the dimensionality of matrix data, the method is sensitive to noise and bias and has difficulty with comparability and interpretation. These issues are…

Methodology · Statistics 2012-12-27 Tomokazu Konishi

We introduce a semi-supervised learning estimator which tends to the first kernel principal component as the number of labelled points vanishes. Our approach is based on the notion of optimal target vector, which is defined as follows.…

Disordered Systems and Neural Networks · Physics 2007-05-23 Leonardo Angelini , Daniele Marinazzo , Mario Pellicoro , Sebastiano Stramaglia

A hypergraph is a useful combinatorial object to model ternary or higher-order relations among entities. Clustering hypergraphs is a fundamental task in network analysis. In this study, we develop two clustering algorithms based on…

Data Structures and Algorithms · Computer Science 2021-10-27 Yuuki Takai , Atsushi Miyauchi , Masahiro Ikeda , Yuichi Yoshida

Unbiased learning to rank (ULTR) aims to train an unbiased ranking model from biased user click logs. Most of the current ULTR methods are based on the examination hypothesis (EH), which assumes that the click probability can be factorized…

Information Retrieval · Computer Science 2022-06-14 Mouxiang Chen , Chenghao Liu , Zemin Liu , Jianling Sun

We propose a routing algorithm that takes a sequence of vectors and computes a new sequence with specified length and vector size. Each output vector maximizes "bang per bit," the difference between a net benefit to use and net cost to…

Machine Learning · Computer Science 2022-12-23 Franz A. Heinsen

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

Recognizing elementary underlying concepts from observations (disentanglement) and generating novel combinations of these concepts (compositional generalization) are fundamental abilities for humans to support rapid knowledge learning and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Tao Yang , Yuwang Wang , Cuiling Lan , Yan Lu , Nanning Zheng

The determination of cluster centers generally depends on the scale that we use to analyze the data to be clustered. Inappropriate scale usually leads to unreasonable cluster centers and thus unreasonable results. In this study, we first…

Machine Learning · Statistics 2016-10-20 Xiurui Geng , Hairong Tang

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been…

Machine Learning · Statistics 2022-02-03 Rebecca Marion , Johannes Lederer , Bernadette Govaerts , Rainer von Sachs

A novel unsupervised learning method is proposed in this paper for biclustering large-dimensional matrix-valued time series based on an entirely new latent two-way factor structure. Each block cluster is characterized by its own row and…

Methodology · Statistics 2025-02-11 Yong He , Xiaoyang Ma , Xingheng Wang , Yalin Wang

We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key…

Computation and Language · Computer Science 2020-07-10 Ali Basirat , Christian Hardmeier , Joakim Nivre

Clustering is an important facet of explorative data mining and finds extensive use in several fields. In this paper, we propose an extension of the classical Fuzzy C-Means clustering algorithm. The proposed algorithm, abbreviated as VFC,…

Computer Vision and Pattern Recognition · Computer Science 2016-11-18 Srinjoy Ganguly , Digbalay Bose , Amit Konar

The massive increase in the data volume and dataset availability for analysts compels researchers to focus on data content and select high-quality datasets to enhance the performance of analytics operators. While selecting high-quality data…

Machine Learning · Computer Science 2025-08-25 Andreas Loizou , Dimitrios Tsoumakos
‹ Prev 1 2 3 10 Next ›