Related papers: Managing Complex Structured Data In a Fast Evolvin…
Cross-border access to a variety of data such as market information, strategic information, or customer-related information defines the daily business of many global companies, including financial institutions. These companies are obliged…
Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the…
Traditionally the integration of data from multiple sources is done on an ad-hoc basis for each analysis scenario and application. This is a solution that is inflexible, incurs in high costs, leads to "silos" that prevent sharing data…
Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle…
Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models.…
Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity…
Intentional or unintentional leakage of confidential data is undoubtedly one of the most severe security threats that organizations face in the digital era. The threat now extends to our personal lives: a plethora of personal information is…
In today world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and co-operations are producing data on a large scale, and since the rise of the internet, e-mail and social…
Ensuring compliance with international data protection standards for privacy and data security is a crucial but complex task, often requiring substantial legal expertise. This paper introduces LegiLM, a novel legal language model…
A long standing goal of the data management community is to develop general, automated systems that ingest semi-structured documents and output queryable tables without human effort or domain specific customization. Given the sheer variety…
Nowadays, many decision support applications need to exploit data that are not only numerical or symbolic, but also multimedia, multistructure, multisource, multimodal, and/or multiversion. We term such data complex data. Managing and…
Large language models (LLMs) are increasingly applied in fields such as finance, education, and governance due to their ability to generate human-like text and adapt to specialized tasks. However, their widespread adoption raises critical…
Data complexity is an important concept in the natural sciences and related areas, but lacks a rigorous and computable definition. In this paper, we focus on a particular sense of complexity that is high if the data is structured in a way…
The data model of an application, the nature and format of data stored across executions, is typically a very rigid part of its early specification, even when prototyping, and changing it after code that relies on it was written can prove…
Programming with logic for sophisticated applications must deal with recursion and negation, which together have created significant challenges in logic, leading to many different, conflicting semantics of rules. This paper describes a…
In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax,…
Our goal is to build classification models using a combination of free-text and structured data. To do this, we represent structured data by text sentences, DataWords, so that similar data items are mapped into the same sentence. This…
With web and mobile platforms becoming more prominent devices utilized in data analysis, there are currently few systems which are not without flaw. In order to increase the performance of these systems and decrease errors of data…
Strings are ubiquitous in code. Not all strings are created equal, some contain structure that makes them incompatible with other strings. CSS units are an obvious example. Worse, type checkers cannot see this structure: this is the latent…
Although large language models (LLMs) have advanced the state-of-the-art in NLP significantly, deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security. As…