Related papers: Knapsack Optimization-based Schema Linking for LLM…
Text-to-SQL generation aims to translate natural language questions into SQL statements. In Text-to-SQL based on large language models, schema linking is a widely adopted strategy to streamline the input for LLMs by selecting only relevant…
In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a…
This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\mathsf{DLA}$ and…
Schema linking -- the process of aligning natural language questions with database schema elements -- is a critical yet underexplored component of Text-to-SQL systems. While recent methods have focused primarily on improving SQL generation,…
Schema linking is a crucial step in Text-to-SQL pipelines. Its goal is to retrieve the relevant tables and columns of a target database for a user's query while disregarding irrelevant ones. However, imperfect schema linking can often…
Recently, large language models (LLMs) have significantly improved the performance of text-to-SQL systems. Nevertheless, many state-of-the-art (SOTA) approaches have overlooked the critical aspect of system robustness. Our experiments…
Schema linking is a critical bottleneck in applying existing Text-to-SQL models to real-world, large-scale, multi-database environments. Through error analysis, we identify two major challenges in schema linking: (1) Database Retrieval:…
It is challenging to convert natural language (NL) questions into executable structured query language (SQL) queries for text-to-SQL tasks due to the vast number of database schemas with redundancy, which interferes with semantic learning,…
Text-to-SQL, which translates a natural language question into an SQL query, has advanced with in-context learning of Large Language Models (LLMs). However, existing methods show little improvement in performance compared to randomly chosen…
Schema linking is a critical step in Text-to-SQL task, aiming to accurately predict the table names and column names required for the SQL query based on the given question. However, current fine-tuning approaches for schema linking models…
Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific…
The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc.…
With Large Language Models' (LLMs) emergent abilities on code generation tasks, Text-to-SQL has become one of the most popular downstream applications. Despite the strong results of multiple recent LLM-based Text-to-SQL frameworks, the…
Text-to-SQL prompt strategies based on Large Language Models (LLMs) achieve remarkable performance on well-known benchmarks. However, when applied to real-world databases, their performance is significantly less than for these benchmarks,…
Text-to-SQL systems translate natural language questions into executable SQL queries, and recent progress with large language models (LLMs) has driven substantial improvements in this task. Schema linking remains a critical component in…
Large language models (LLMs) have demonstrated significant potential in code generation tasks. However, there remains a performance gap between open-source and closed-source models. To address this gap, existing approaches typically…
Schema matching is essential for integrating heterogeneous data sources and enhancing dataset discovery, yet it remains a complex and resource-intensive problem. We introduce SCHEMORA, a schema matching framework that combines large…
The generalizability to new databases is of vital importance to Text-to-SQL systems which aim to parse human utterances into SQL statements. Existing works achieve this goal by leveraging the exact matching method to identify the lexical…
In this paper, we investigate the possibility of improving the performance of multi-objective optimization solution approaches using machine learning techniques. Specifically, we focus on multi-objective binary linear programs and employ…
A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior…