Related papers: Error-Tolerant E-Discovery Protocols
We consider multi-party protocols for classification that are motivated by applications such as e-discovery in court proceedings. We identify a protocol that guarantees that the requesting party receives all responsive documents and the…
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) motivated by electronic discovery. In this problem, our goal is to design a protocol that guarantees the requesting party receives…
In the United States, the parties to a lawsuit are required to search through their electronically stored information to find documents that are relevant to the specific case and produce them to their opposing party. Negotiations over the…
In legal eDiscovery, the parties are required to search through their electronically stored information to find documents that are relevant to a specific case. Negotiations over the scope of these searches are often based on a fear that…
We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as…
In eDiscovery, a party to a lawsuit or similar action must search through available information to identify those documents and files that are relevant to the suit. Search efforts tend to identify less than 100% of the relevant documents…
The growing volumes of data being collected and its analysis to provide better services are creating worries about digital privacy. To address privacy concerns and give practical solutions, the literature has relied on secure multiparty…
In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication…
Multiparty session types are designed to abstractly capture the structure of communication protocols and verify behavioural properties. One important such property is progress, i.e., the absence of deadlock. Distributed algorithms often…
This paper presents an efficient fair document exchange protocol. The exchange of the documents will be between two parties. The protocol is based on the verifiable and recoverable encryption of a document's key. This verifiable and…
State-of-the-art systems in deep question answering proceed as follows: (1) an initial document retrieval selects relevant documents, which (2) are then processed by a neural network in order to extract the final answer. Yet the exact…
Traditional error detection approaches require user-defined parameters and rules. Thus, the user has to know both the error detection system and the data. However, we can also formulate error detection as a semi-supervised classification…
Federated knowledge discovery and data mining are challenged to assess the trustworthiness of data originating from autonomous sources while protecting confidentiality and privacy. Truth-finding algorithms help corroborate data from…
Security protocols are often found to be flawed after their deployment. We present an approach that aims at the neutralization or mitigation of the attacks to flawed protocols: it avoids the complete dismissal of the interested protocol and…
Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In…
Multi-party learning provides solutions for training joint models with decentralized data under legal and practical constraints. However, traditional multi-party learning approaches are confronted with obstacles such as system…
Data attribution methods aim to answer useful counterfactual questions like "what would a ML model's prediction be if it were trained on a different dataset?" However, estimation of data attribution models through techniques like empirical…
Entity Resolution (ER) is typically implemented as a batch task that processes all available data before identifying duplicate records. However, applications with time or computational constraints, e.g., those running in the cloud, require…
Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text.…
With oblivious transfer multiparty protocols become possible even in the presence of a faulty majority. But all known protocols can be aborted by just one disruptor. This paper presents more robust solutions for multiparty protocols with…