Related papers: Query Significance in Databases via Randomizations
Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to…
Halls of Fame are fascinating constructs. They represent the elite of an often very large amount of entities---persons, companies, products, countries etc. Beyond their practical use as static rankings, changes to them are particularly…
Ranked enumeration is a query-answering paradigm where the query answers are returned incrementally in order of importance (instead of returning all answers at once). Importance is defined by a ranking function that can be specific to the…
Tables are common and important in scientific documents, yet most text-based document search systems do not capture structures and semantics specific to tables. How to bridge different types of mismatch between keywords queries and…
Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching…
Distributions over rankings are used to model data in various settings such as preference analysis and political elections. The factorial size of the space of rankings, however, typically forces one to make structural assumptions, such as…
Large databases are often organized by hand-labeled metadata, or criteria, which are expensive to collect. We can use unsupervised learning to model database variation, but these models are often high dimensional, complex to parameterize,…
Direct access asks for the retrieval of query answers by their ranked position, given a query and a desired order. While the time complexity of data structures supporting such accesses has been studied in depth, and efficient algorithms for…
A set of preferred records can be obtained from a large database in a multi-criteria setting using various computational methods which either depend on the concept of dominance or on the concept of utility or scoring function based on the…
Large, data centric applications are characterized by its different attributes. In modern day, a huge majority of the large data centric applications are based on relational model. The databases are collection of tables and every table…
Performing effective preference-based data retrieval requires detailed and preferentially meaningful structurized information about the current user as well as the items under consideration. A common problem is that representations of items…
Novel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of identifying a particular entity. Next to a more…
Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this…
A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on…
The majority of Semantic Web search engines retrieve information by focusing on the use of concepts and relations restricted to the query provided by the user. By trying to guess the implicit meaning between these concepts and relations,…
User content curation is becoming an important source of preference data, as well as providing information regarding the items being curated. One popular approach involves the creation of lists. On Twitter, these lists might contain…
There is a wide variety of data mining methods available, and it is generally useful in exploratory data analysis to use many different methods for the same dataset. This, however, leads to the problem of whether the results found by one…
The problem of frequent pattern mining has been studied quite extensively for various types of data, including sets, sequences, and graphs. Somewhat surprisingly, another important type of data, namely rank data, has received very little…
Query answering over probabilistic data is an important task but is generally intractable. However, a new approach for this problem has recently been proposed, based on structural decompositions of input databases, following, e.g., tree…
Relational databases play a central role in many information systems. Their schema contains structural (e.g. tables and columns) and behavioral (e.g. stored procedures or views) entity descriptions. Then, just like for ``normal'' software,…