Michael Mathioudakis
While digitized corpora have transformed the study of intellectual transmission, current methods rely heavily on lexical text reuse detection, capturing verbatim quotations but fundamentally missing paraphrases and complex implicit…
Learned indexes fit machine learning (ML) models to the data and use them to make query operations more time and space-efficient. Recent works propose using learned spatial indexes to improve spatial query performance by optimizing the…
Text reuse is a methodological element of fundamental importance in humanities research: pieces of text that re-appear across different documents, verbatim or paraphrased, provide invaluable information about the historical spread and…
It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time. However, this can be costly as it usually requires processing the entire dataset again. This creates a trade-off…
Diversity maximization aims to select a diverse and representative subset of items from a large dataset. It is a fundamental optimization task that finds applications in data summarization, feature selection, web search, recommender…
Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum…
Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more…
Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is…
Bayesian networks are popular probabilistic models that capture the conditional dependencies among a set of variables. Inference in Bayesian networks is a fundamental task for answering probabilistic queries over a subset of variables in…
The task of node classification is to infer unknown node labels, given the labels for some of the nodes along with the network structure and other node attributes. Typically, approaches for this task assume homophily, whereby neighboring…
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency, i.e., they should…
We consider the problem of designing affirmative action policies for selecting the top-k candidates from a pool of applicants. We assume that for each candidate we have socio-demographic attributes and a series of variables that serve as…
We study the problem of selecting the top-k candidates from a pool of applicants, where each candidate is associated with a score indicating his/her aptitude. Depending on the specific scenario, such as job search or college admissions,…
We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…
Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method for Variable Elimination, which can lead to significant efficiency gains when…
Extracting a small subset of representative tuples from a large database is an important task in multi-criteria decision making. The regret-minimizing set (RMS) problem is recently proposed for representative discovery from databases.…
In this paper, we study university admissions under a centralized system that uses grades and standardized test scores to match applicants to university programs. We consider affirmative action policies that seek to increase the number of…
In networking applications, one often wishes to obtain estimates about the number of objects at different parts of the network (e.g., the number of cars at an intersection of a road network or the number of packets expected to reach a node…
Society is often polarized by controversial issues, that split the population into groups of opposing views. When such issues emerge on social media, we often observe the creation of 'echo chambers', i.e., situations where like-minded…
Echo chambers, i.e., situations where one is exposed only to opinions that agree with their own, are an increasing concern for the political discourse in many democratic countries. This paper studies the phenomenon of political echo…