Related papers: Vector-based categorization analysis with improved…

Fast data sorting with modified principal component analysis to distinguish unique single molecular break junction trajectories

A simple and fast analysis method to sort large data sets into groups with shared distinguishing characteristics is described, and applied to single molecular break junction conductance versus electrode displacement data. The method, based…

Mesoscale and Nanoscale Physics · Physics 2018-01-10 J. M. Hamill , X. T. Zhao , G. Mészáros , M. R. Bryce , M. Arenz

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final…

Machine Learning · Statistics 2023-08-14 Luca Coraggio , Pietro Coretto

Vector Field k-Means: Clustering Trajectories by Fitting Multiple Vector Fields

Scientists study trajectory data to understand trends in movement patterns, such as human mobility for traffic analysis and urban planning. There is a pressing need for scalable and efficient techniques for analyzing this data and…

Machine Learning · Computer Science 2012-09-03 Nivan Ferreira , James T. Klosowski , Carlos Scheidegger , Claudio Silva

Scaling pattern mining through non-overlapping variable partitioning

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and…

Databases · Computer Science 2022-12-13 Leonardo Alexandre , Rafael S. Costa , Rui Henriques

Toward Efficient and Scalable Design of In-Memory Graph-Based Vector Search

Vector data is prevalent across business and scientific applications, and its popularity is growing with the proliferation of learned embeddings. Vector data collections often reach billions of vectors with thousands of dimensions, thus,…

Information Retrieval · Computer Science 2025-09-09 Ilias Azizi , Karima Echihab , Themis Palpanas , Vassilis Christophides

Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art

Vector data is prevalent across business and scientific applications, and its popularity is growing with the proliferation of learned embeddings. Vector data collections often reach billions of vectors with thousands of dimensions, thus,…

Information Retrieval · Computer Science 2025-09-08 Ilias Azizi , Karima Echihabi , Themis Palpanas

Vectorising the detector geometry to optimize particle transport

Among the components contributing to particle transport, geometry navigation is an important consumer of CPU cycles. The tasks performed to get answers to "basic" queries such as locating a point within a geometry hierarchy or computing…

Computational Physics · Physics 2013-12-04 John Apostolakis , René Brun , Federico Carminati , Andrei Gheata , Sandro Wenzel

Principal Component Analysis for Experiments

Motivation: Although principal component analysis is frequently applied to reduce the dimensionality of matrix data, the method is sensitive to noise and bias and has difficulty with comparability and interpretation. These issues are…

Methodology · Statistics 2012-12-27 Tomokazu Konishi

Semi-supervised learning by search of optimal target vector

We introduce a semi-supervised learning estimator which tends to the first kernel principal component as the number of labelled points vanishes. Our approach is based on the notion of optimal target vector, which is defined as follows.…

Disordered Systems and Neural Networks · Physics 2007-05-23 Leonardo Angelini , Daniele Marinazzo , Mario Pellicoro , Sebastiano Stramaglia

Hypergraph Clustering Based on PageRank

A hypergraph is a useful combinatorial object to model ternary or higher-order relations among entities. Clustering hypergraphs is a fundamental task in network analysis. In this study, we develop two clustering algorithms based on…

Data Structures and Algorithms · Computer Science 2021-10-27 Yuuki Takai , Atsushi Miyauchi , Masahiro Ikeda , Yuichi Yoshida

Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank

Unbiased learning to rank (ULTR) aims to train an unbiased ranking model from biased user click logs. Most of the current ULTR methods are based on the examination hypothesis (EH), which assumes that the click probability can be factorized…

Information Retrieval · Computer Science 2022-06-14 Mouxiang Chen , Chenghao Liu , Zemin Liu , Jianling Sun

An Algorithm for Routing Vectors in Sequences

We propose a routing algorithm that takes a sequence of vectors and computes a new sequence with specified length and vector size. Each output vector maximizes "bang per bit," the difference between a net benefit to use and net cost to…

Machine Learning · Computer Science 2022-12-23 Franz A. Heinsen

Clustering Plotted Data by Image Segmentation

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization

Recognizing elementary underlying concepts from observations (disentanglement) and generating novel combinations of these concepts (compositional generalization) are fundamental abilities for humans to support rapid knowledge learning and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Tao Yang , Yuwang Wang , Cuiling Lan , Yan Lu , Nanning Zheng

Clustering by connection center evolution

The determination of cluster centers generally depends on the scale that we use to analyze the data to be clustered. Inappropriate scale usually leads to unreasonable cluster centers and thus unreasonable results. In this study, we first…

Machine Learning · Statistics 2016-10-20 Xiurui Geng , Hairong Tang

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been…

Machine Learning · Statistics 2022-02-03 Rebecca Marion , Johannes Lederer , Bernadette Govaerts , Rainer von Sachs

Factor Modelling for Biclustering Large-dimensional Matrix-valued Time Series

A novel unsupervised learning method is proposed in this paper for biclustering large-dimensional matrix-valued time series based on an entirely new latent two-way factor structure. Each block cluster is characterized by its own row and…

Methodology · Statistics 2025-02-11 Yong He , Xiaoyang Ma , Xingheng Wang , Yalin Wang

Principal Word Vectors

We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key…

Computation and Language · Computer Science 2020-07-10 Ali Basirat , Christian Hardmeier , Joakim Nivre

Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm

Clustering is an important facet of explorative data mining and finds extensive use in several fields. In this paper, we propose an extension of the classical Fuzzy C-Means clustering algorithm. The proposed algorithm, abbreviated as VFC,…

Computer Vision and Pattern Recognition · Computer Science 2016-11-18 Srinjoy Ganguly , Digbalay Bose , Amit Konar

Analytics Modelling over Multiple Datasets using Vector Embeddings

The massive increase in the data volume and dataset availability for analysts compels researchers to focus on data content and select high-quality datasets to enhance the performance of analytics operators. While selecting high-quality data…

Machine Learning · Computer Science 2025-08-25 Andreas Loizou , Dimitrios Tsoumakos