Related papers: Supervised Contamination Detection, with Flow Cyto…
Flow cytometry is a technology that rapidly measures antigen-based markers associated to cells in a cell population. Although analysis of flow cytometry data has traditionally considered one or two markers at a time, there has been…
Flow cytometry mainly used for detecting the characteristics of a number of biochemical substances based on the expression of specific markers in cells. It is particularly useful for detecting membrane surface receptors, antigens, ions, or…
We present a Bayesian approach for the Contamination Source Detection problem in Water Distribution Networks. Given an observation of contaminants in one or more nodes in the network, we try to give probable explanation for it assuming that…
We consider the use of the Joint Clustering and Matching (JCM) procedure for the supervised classification of a flow cytometric sample with respect to a number of predefined classes of such samples. The JCM procedure has been proposed as a…
We consider the problem of testing whether two samples of contaminated data, possibly paired, are from the same distribution. Is is assumed that the contaminations are additive noises with known moments of all orders. The test statistic is…
Anomaly detection methods identify examples that do not follow the expected behaviour, typically in an unsupervised fashion, by assigning real-valued anomaly scores to the examples based on various heuristics. These scores need to be…
We study the inference of the origin and the pattern of contamination in water distribution networks. We assume a simplified model for the dyanmics of the contamination spread inside a water distribution network, and assume that at some…
Background: Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that…
CDD, or Contamination Detection via output Distribution, identifies data contamination by measuring the peakedness of a model's sampled outputs. We study the conditions under which this approach succeeds and fails on small language models…
Flow cytometry is a powerful quantitative assay supporting high-throughput collection of single-cell data with a high dynamic range. For flow cytometry to yield reproducible data with a quantitative relationship to the underlying biology,…
Flow cytometry is a technique that measures multiple fluorescence and light scatter-associated parameters from individual cells as they flow a single file through an excitation light source. These cells are labeled with antibodies to detect…
Conformal prediction is a flexible framework for calibrating machine learning predictions, providing distribution-free statistical guarantees. In outlier detection, this calibration relies on a reference set of labeled inlier data to…
Specimens are collected from $N$ different sources. Each specimen has probability $p$ of being contaminated (e.g., in the case of an infectious disease, $p$ is the prevalence rate), independently of the other specimens. In many cases group…
Circulating blood cell clusters (CCCs) containing red blood cells (RBCs), white blood cells(WBCs), and platelets are significant biomarkers linked to conditions like thrombosis, infection, and inflammation. Flow cytometry, paired with…
While previous distribution shift detection approaches can identify if a shift has occurred, these approaches cannot localize which specific features have caused a distribution shift -- a critical step in diagnosing or fixing any underlying…
In data analysis, contamination caused by outliers is inevitable, and robust statistical methods are strongly demanded. In this paper, our concern is to develop a new approach for robust data analysis based on scoring rules. The scoring…
The ocean is filled with phytoplankton that contribute as much photosynthesis as all land plants combined, making them vital to the carbon cycle and climate system. Recent advances in flow cytometry allow oceanographers to measure the…
With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to…
Large language models (LLMs) are widely used, but concerns about data contamination challenge the reliability of LLM evaluations. Existing contamination detection methods are often task-specific or require extra prerequisites, limiting…
In this paper, we advocate a novel measure for the purpose of checking the quality of a cluster partition for a sample into several distinct classes, and thus, determine the unknown value for the true number of clusters prevailing the…