Related papers: Data Migration using Datalog Program Synthesis
While many applications export data in hierarchical formats like XML and JSON, it is often necessary to convert such hierarchical documents to a relational representation. This paper presents a novel programming-by-example approach, and its…
Many businesses depend on legacy systems, which often use outdated technology that complicates maintenance and updates. Therefore, software modernization is essential, particularly data migration between different database schemas.…
Database migration is a key task in software modernization, increasingly involving transformations across heterogeneous data models such as relational and NoSQL systems. Existing approaches are typically designed for specific source-target…
Despite the popularity of information measures in analysis of probabilistic systems, proper tools for their visualization are not common. This work develops a simple matrix representation of information transfer in sequential systems,…
Computer networks are hard to manage. Given a set of high-level requirements (e.g., reachability, security), operators have to manually figure out the individual configuration of potentially hundreds of devices running complex distributed…
Many programs that interact with a database need to undergo schema refactoring several times during their life cycle. Since this process typically requires making significant changes to the program's implementation, schema refactoring is…
Dynamic changes in processes necessitate the notion of state equivalence between the old and new workflows. In several cases, the history of the workflow to be migrated provides sufficient context for a meaningful migration. In this paper,…
Recursive query processing has experienced a recent resurgence, as a result of its use in many modern application domains, including data integration, graph analytics, security, program analysis, networking and decision making. Due to the…
JSON (JavaScript Object Notation) is a data encoding that allows structured data to be used in a standardized and straightforward manner across systems. Schemas for JSON-formatted data can be constructed using the JSON Schema standard,…
When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to…
Library migration is a common but error-prone task in software development. Developers may need to replace one library with another due to reasons like changing requirements or licensing changes. Migration typically entails updating and…
Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files…
Synthetic data has emerged as a crucial solution to the data scarcity bottleneck in large language models (LLMs), particularly for specialized domains and low-resource languages. However, the broader adoption of existing synthetic data…
Synchronous Data Flow (SDF) model is widely used for specifying signal processing or streaming applications. Since modern embedded applications become more complex with dynamic behavior changes at run-time, several extensions of the SDF…
Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field.…
Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle…
Despite recent advances in large language models, building dependable and deployable NLP models typically requires abundant, high-quality training data. However, task-specific data is not available for many use cases, and manually curating…
The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems,…
Schema and data integration have been a challenge for more than 40 years. While data warehouse technologies are quite a success story, there is still a lack of information integration methods, especially if the data sources are based on…
A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true…