Related papers: Concept driven framework for Latent Table Discover…
Querying is one of the basic functionality expected from a database system. Query efficiency is adversely affected by increase in the number of participating tables. Also, querying based on syntax largely limits the gamut of queries a…
Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this…
Data discovery in data lakes with ever increasing datasets has long been recognized as a big challenge in the realm of data management, especially for semantic search of and hierarchical global catalog generation of tables. While large…
With the growing abundance of repositories containing tabular data, discovering relevant tables for in-depth analysis remains a challenging task. Existing table discovery methods primarily retrieve desired tables based on a query table or…
Existing query languages for data discovery exhibit system-driven designs that emphasize database features and functionality over user needs. We propose a re-prioritization of the client through an introduction of a language-driven approach…
Our work addresses the challenges of understanding tables. Existing methods often struggle with the unpredictable nature of table content, leading to a reliance on preprocessing and keyword matching. They also face limitations due to the…
Computer-based information technologies have been extensively used to help many organizations, private companies, and academic and education institutions manage their processes and information systems hereby become their nervous centre. The…
Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches…
We propose a novel framework to facilitate the on-demand design of data-centric systems by exploiting domain knowledge from an existing ontology. Its key ingredient is a process that we call focusing, which allows to obtain a schema for a…
As the use of technology increases and data analysis becomes integral in many businesses, the ability to quickly access and interpret data has become more important than ever. Information retrieval technologies are being utilized by…
The growing reliance on data-driven decision-making highlights the need for more intuitive ways to access and analyze information stored in relational databases. However, the requirement of SQL knowledge has long been a significant barrier…
Tackling the information retrieval gap between non-technical database end-users and those with the knowledge of formal query languages has been an interesting area of data management and analytics research. The use of natural language…
When people search for information about a new topic within large document collections, they implicitly construct a mental model of the unfamiliar information space to represent what they currently know and guide their exploration into the…
How to generate a large, realistic set of tables along with joinability relationships, to stress-test dataset discovery methods? Dataset discovery methods aim to automatically identify related data assets in a data lake. The development and…
Contemporary database systems, while effective, suffer severe issues related to complexity and usability, especially among individuals who lack technical expertise but are unfamiliar with query languages like Structured Query Language…
Tables are common and important in scientific documents, yet most text-based document search systems do not capture structures and semantics specific to tables. How to bridge different types of mismatch between keywords queries and…
Conversational interfaces that allow for intuitive and comprehensive access to digitally stored information remain an ambitious goal. In this thesis, we lay foundations for designing conversational search systems by analyzing the…
Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity…
Agentic discovery has shown that LLM-driven search can find novel algorithms, designs, and code under benchmark conditions. Translating the paradigm to multi-system data backends surfaces a harder problem: the search space is heterogeneous,…
With the rapid growth of internet technologies, Web has become a huge repository of information and keeps growing exponentially under no editorial control. However the human capability to read, access and understand Web content remains…