Related papers: ML Based Lineage in Databases

Explaining Natural Language Query Results

Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are…

Databases · Computer Science 2020-07-10 Daniel Deutch , Nave Frost , Amir Gilad

RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data

There are massive amounts of textual data residing in databases, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, word embeddings are increasingly utilized to convert symbolic…

Databases · Computer Science 2020-01-23 Michael Günther , Maik Thiele , Wolfgang Lehner

Incorporating Deep Learning Design in Database Queries

Deep learning over relational databases is conventionally realized by translating data into graph representations and applying graph-based neural networks within external frameworks. This round-trip between the database and external machine…

Databases · Computer Science 2026-05-26 Yuval Lev Lubarsky , Dean Light , Boaz Berger , Shunit Agmon , Benny Kimelfeld

Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text…

Computation and Language · Computer Science 2024-12-06 Zheye Deng , Chunkit Chan , Weiqi Wang , Yuxi Sun , Wei Fan , Tianshi Zheng , Yauwai Yim , Yangqiu Song

Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries

Multi-modal datasets, like those involving images, often miss the detailed descriptions that properly capture the rich information encoded in each item. This makes answering complex natural language queries a major challenge in this domain.…

Information Retrieval · Computer Science 2025-06-03 Mahdi Erfanian , Mohsen Dehghankar , Abolfazl Asudeh

Selecting Walk Schemes for Database Embedding

Machinery for data analysis often requires a numeric representation of the input. Towards that, a common practice is to embed components of structured data into a high-dimensional vector space. We study the embedding of the tuples of a…

Machine Learning · Computer Science 2024-01-23 Yuval Lev Lubarsky , Jan Tönshoff , Martin Grohe , Benny Kimelfeld

Judgement Citation Retrieval using Contextual Similarity

Traditionally in the domain of legal research, the retrieval of pertinent citations from intricate case descriptions has demanded manual effort and keyword-based search applications that mandate expertise in understanding legal jargon.…

Information Retrieval · Computer Science 2024-08-16 Akshat Mohan Dasula , Hrushitha Tigulla , Preethika Bhukya

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large…

Computation and Language · Computer Science 2024-05-16 Bowen Zhang , Kehua Chang , Chunping Li

Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs

Extracting sentence embeddings from large language models (LLMs) is a promising direction, as LLMs have demonstrated stronger semantic understanding capabilities. Previous studies typically focus on prompt engineering to elicit sentence…

Computation and Language · Computer Science 2025-07-04 Yuchen Fu , Zifeng Cheng , Zhiwei Jiang , Zhonghui Wang , Yafeng Yin , Zhengliang Li , Qing Gu

Database Queries that Explain their Work

Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to…

Programming Languages · Computer Science 2014-08-13 James Cheney , Amal Ahmed , Umut A. Acar

Investigating Consistency in Query-Based Meeting Summarization: A Comparative Study of Different Embedding Methods

With more and more advanced data analysis techniques emerging, people will expect these techniques to be applied in more complex tasks and solve problems in our daily lives. Text Summarization is one of famous applications in Natural…

Computation and Language · Computer Science 2024-02-13 Chen Jia-Chen , Guillem Senabre , Allane Caron

Local Embeddings for Relational Data Integration

Deep learning based techniques have been recently used with promising results for data integration problems. Some methods directly use pre-trained embeddings that were trained on a large corpus such as Wikipedia. However, they may not…

Databases · Computer Science 2020-09-04 Riccardo Cappuzzo , Paolo Papotti , Saravanan Thirumuruganathan

Sentence transition matrix: An efficient approach that preserves sentence semantics

Sentence embedding is a significant research topic in the field of natural language processing (NLP). Generating sentence embedding vectors reflecting the intrinsic meaning of a sentence is a key factor to achieve an enhanced performance in…

Computation and Language · Computer Science 2019-01-17 Myeongjun Jang , Pilsung Kang

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells,…

Information Retrieval · Computer Science 2019-06-04 Li Deng , Shuo Zhang , Krisztian Balog

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Similarity query is the family of queries based on some similarity metrics. Unlike the traditional database queries which are mostly based on value equality, similarity queries aim to find targets "similar enough to" the given data objects,…

Databases · Computer Science 2022-04-19 Yifan Wang

Towards Approximate Query Enumeration with Sublinear Preprocessing Time

This paper aims at providing extremely efficient algorithms for approximate query enumeration on sparse databases, that come with performance and accuracy guarantees. We introduce a new model for approximate query enumeration on classes of…

Databases · Computer Science 2021-01-19 Isolde Adler , Polly Fahey

Potential Field Based Deep Metric Learning

Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Shubhang Bhatnagar , Narendra Ahuja

Learning Tuple Probabilities

Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so…

Databases · Computer Science 2016-09-21 Maximilian Dylla , Martin Theobald

Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

Generating accurate SQL from users' natural language questions (text-to-SQL) remains a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Traditional…

Computation and Language · Computer Science 2025-11-25 Zijin Hong , Zheng Yuan , Qinggang Zhang , Hao Chen , Junnan Dong , Feiran Huang , Xiao Huang

SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data

Tabular data is the most commonly used form of data in industry. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data. DNN models using…

Computer Vision and Pattern Recognition · Computer Science 2019-06-05 Baohua Sun , Lin Yang , Wenhan Zhang , Michael Lin , Patrick Dong , Charles Young , Jason Dong