Related papers: Managing Complex Structured Data In a Fast Evolvin…

Law-Aware Access Control and its Information Model

Cross-border access to a variety of data such as market information, strategic information, or customer-related information defines the daily business of many global companies, including financial institutions. These companies are obliged…

Cryptography and Security · Computer Science 2010-06-24 Michael Stieghahn , Thomas Engel

Towards Consistent Language Models Using Declarative Constraints

Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the…

Databases · Computer Science 2023-12-27 Jasmin Mousavi , Arash Termehchy

An Architecture for Establishing Legal Semantic Workflows in the Context of Integrated Law Enforcement

Traditionally the integration of data from multiple sources is done on an ad-hoc basis for each analysis scenario and application. This is a solution that is inflexible, incurs in high costs, leads to "silos" that prevent sharing data…

Computers and Society · Computer Science 2017-08-23 Markus Stumptner , Wolfgang Mayer , Georg Grossmann , Jixue Liu , Wenhao Li , Pompeu Casanovas , Louis De Koker , Danuta Mendelson , David Watts , Bridget Bainbridge

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle…

Computation and Language · Computer Science 2025-07-08 Ziyang Miao , Qiyu Sun , Jingyuan Wang , Yuchen Gong , Yaowei Zheng , Shiqi Li , Richong Zhang

UniDM: A Unified Framework for Data Manipulation with Large Language Models

Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models.…

Artificial Intelligence · Computer Science 2024-05-13 Yichen Qian , Yongyi He , Rong Zhu , Jintao Huang , Zhijian Ma , Haibin Wang , Yaohua Wang , Xiuyu Sun , Defu Lian , Bolin Ding , Jingren Zhou

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics

Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity…

Computation and Language · Computer Science 2024-02-20 YiQiu Guo , Yuchen Yang , Ya Zhang , Yu Wang , Yanfeng Wang

Lime: Data Lineage in the Malicious Environment

Intentional or unintentional leakage of confidential data is undoubtedly one of the most severe security threats that organizations face in the digital era. The threat now extends to our personal lives: a plethora of personal information is…

Cryptography and Security · Computer Science 2014-08-06 Michael Backes , Niklas Grimm , Aniket Kate

Performance Evaluation of a Natural Language Processing approach applied in White Collar crime investigation

In today world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and co-operations are producing data on a large scale, and since the rise of the internet, e-mail and social…

Information Retrieval · Computer Science 2016-09-06 Maarten Banerveld , Nhien-An Le-Khac , Tahar Kechadi

LegiLM: A Fine-Tuned Legal Language Model for Data Compliance

Ensuring compliance with international data protection standards for privacy and data security is a crucial but complex task, often requiring substantial legal expertise. This paper introduces LegiLM, a novel legal language model…

Computation and Language · Computer Science 2024-09-24 Linkai Zhu , Lu Yang , Chaofan Li , Shanwen Hu , Lu Liu , Bin Yin

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

A long standing goal of the data management community is to develop general, automated systems that ingest semi-structured documents and output queryable tables without human effort or domain specific customization. Given the sheer variety…

Computation and Language · Computer Science 2025-03-10 Simran Arora , Brandon Yang , Sabri Eyuboglu , Avanika Narayan , Andrew Hojel , Immanuel Trummer , Christopher Ré

An Architecture Framework for Complex Data Warehouses

Nowadays, many decision support applications need to exploit data that are not only numerical or symbolic, but also multimedia, multistructure, multisource, multimodal, and/or multiversion. We term such data complex data. Managing and…

Databases · Computer Science 2007-07-12 Jérôme Darmont , Omar Boussaid , Jean-Christian Ralaivao , Kamel Aouiche

LLM Access Shield: Domain-Specific LLM Framework for Privacy Policy Compliance

Large language models (LLMs) are increasingly applied in fields such as finance, education, and governance due to their ability to generate human-like text and adapt to specialized tasks. However, their widespread adoption raises critical…

Cryptography and Security · Computer Science 2025-05-26 Yu Wang , Cailing Cai , Zhihua Xiao , Peifung E. Lam

Local Compositional Complexity: How to Detect a Human-readable Messsage

Data complexity is an important concept in the natural sciences and related areas, but lacks a rigorous and computable definition. In this paper, we focus on a particular sense of complexity that is high if the data is structured in a way…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Louis Mahon

Dynamic data models: an application of MOP-based persistence in Common Lisp

The data model of an application, the nature and format of data stored across executions, is typically a very rigid part of its early specification, even when prototyping, and changing it after code that relies on it was written can prove…

Software Engineering · Computer Science 2008-02-26 Pierre Thierry , Simon E. B. Thierry

Knowledge of Uncertain Worlds: Programming with Logical Constraints

Programming with logic for sophisticated applications must deal with recursion and negation, which together have created significant challenges in logic, leading to many different, conflicting semantics of rules. This paper describes a…

Logic in Computer Science · Computer Science 2021-10-07 Yanhong A. Liu , Scott D. Stoller

LawLLM: Law Large Language Model for the US Legal System

In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax,…

Computation and Language · Computer Science 2024-08-01 Dong Shu , Haoran Zhao , Xukun Liu , David Demeter , Mengnan Du , Yongfeng Zhang

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Our goal is to build classification models using a combination of free-text and structured data. To do this, we represent structured data by text sentences, DataWords, so that similar data items are mapped into the same sentence. This…

Machine Learning · Computer Science 2022-02-18 Stephen I. Gallant , Mirza Nasir Hossain

Visualization Techniques with Data Cubes: Utilizing Concurrency for Complex Data

With web and mobile platforms becoming more prominent devices utilized in data analysis, there are currently few systems which are not without flaw. In order to increase the performance of these systems and decrease errors of data…

Databases · Computer Science 2022-05-03 Daniel Szelogowski

SafeStrings: Representing Strings as Structured Data

Strings are ubiquitous in code. Not all strings are created equal, some contain structure that makes them incompatible with other strings. CSS units are an obvious example. Worse, type checkers cannot see this structure: this is the latent…

Programming Languages · Computer Science 2019-04-26 David Kelly , Mark Marron , David Clark , Earl T. Barr

Making Large Language Models Better Data Creators

Although large language models (LLMs) have advanced the state-of-the-art in NLP significantly, deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security. As…

Computation and Language · Computer Science 2023-11-01 Dong-Ho Lee , Jay Pujara , Mohit Sewak , Ryen W. White , Sujay Kumar Jauhar