Related papers: Extracting Syntactic Patterns from Databases

Natural language to SQL in low-code platforms

One of the developers' biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we…

Artificial Intelligence · Computer Science 2023-08-30 Sofia Aparicio , Samuel Arcadinho , João Nadkarni , David Aparício , João Lages , Mariana Lourenço , Bartłomiej Matejczyk , Filipe Assunção

Can Deep Neural Networks Predict Data Correlations from Column Names?

Recent publications suggest using natural language analysis on database schema elements to guide tuning and profiling efforts. The underlying hypothesis is that state-of-the-art language processing methods, so-called language models, are…

Databases · Computer Science 2023-09-12 Immanuel Trummer

Finding Sequential Patterns from Large Sequence Data

Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining…

Databases · Computer Science 2010-02-08 Mahdi Esmaeili , Fazekas Gabor

Faster Approximate Pattern Matching in Compressed Repetitive Texts

Motivated by the imminent growth of massive, highly redundant genomic databases, we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the…

Data Structures and Algorithms · Computer Science 2012-11-01 Travis Gagie , Paweł Gawrychowski , Christopher Hoobin , Simon J. Puglisi

Learning Models over Relational Data: A Brief Tutorial

This tutorial overviews the state of the art in learning models over relational databases and makes the case for a first-principles approach that exploits recent developments in database research. The input to learning classification and…

Databases · Computer Science 2019-11-18 Maximilian Schleich , Dan Olteanu , Mahmoud Abo-Khamis , Hung Q. Ngo , XuanLong Nguyen

Efficiently Summarising Event Sequences with Rich Interleaving Patterns

Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider,…

Artificial Intelligence · Computer Science 2017-08-11 Apratim Bhattacharyya , Jilles Vreeken

Summarization Techniques for Pattern Collections in Data Mining

Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A…

Databases · Computer Science 2007-05-23 Taneli Mielikäinen

Learning from Uncurated Regular Expressions

Significant work has been done on learning regular expressions from a set of data values. Depending on the domain, this approach can be very successful. However, significant time is required to learn these expressions and the resulting…

Databases · Computer Science 2024-03-18 Michael J. Mior

From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction

While deep learning approaches to information extraction have had many successes, they can be difficult to augment or maintain as needs shift. Rule-based methods, on the other hand, can be more easily modified. However, crafting rules…

Computation and Language · Computer Science 2022-02-02 Robert Vacareanu , Marco A. Valenzuela-Escarcega , George C. G. Barbosa , Rebecca Sharp , Mihai Surdeanu

Syntactic Patterns Improve Information Extraction for Medical Search

Medical professionals search the published literature by specifying the type of patients, the medical intervention(s) and the outcome measure(s) of interest. In this paper we demonstrate how features encoding syntactic patterns improve the…

Computation and Language · Computer Science 2018-05-02 Roma Patel , Yinfei Yang , Iain Marshall , Ani Nenkova , Byron Wallace

NameGuess: Column Name Expansion for Tabular Data

Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can…

Computation and Language · Computer Science 2023-10-23 Jiani Zhang , Zhengyuan Shen , Balasubramaniam Srinivasan , Shen Wang , Huzefa Rangwala , George Karypis

Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

Many services today massively and continuously produce log files of different and varying formats. These logs are important since they contain information about the application activities, which is necessary for improvements by analyzing…

Information Retrieval · Computer Science 2023-04-11 Igor Cherepanov , Jonathan Geraldi Joewono , Arjan Kuijper , Jörn Kohlhammer

Generalized Linear Rule Models

This paper considers generalized linear models using rule-based features, also referred to as rule ensembles, for regression and probabilistic classification. Rules facilitate model interpretation while also capturing nonlinear dependences…

Machine Learning · Computer Science 2019-06-06 Dennis Wei , Sanjeeb Dash , Tian Gao , Oktay Günlük

Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns

Knowledge base construction is crucial for summarising, understanding and inferring relationships between biomedical entities. However, for many practical applications such as drug discovery, the scarcity of relevant facts (e.g. gene X is…

Computation and Language · Computer Science 2019-07-04 Julien Fauqueur , Ashok Thillaisundaram , Theodosia Togia

Learning Semantic String Transformations from Examples

We address the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc. Unlike syntactic…

Databases · Computer Science 2012-04-30 Rishabh Singh , Sumit Gulwani

Synthesizing Document Database Queries using Collection Abstractions

Document databases are increasingly popular in various applications, but their queries are challenging to write due to the flexible and complex data model underlying document databases. This paper presents a synthesis technique that aims to…

Databases · Computer Science 2024-12-10 Qikang Liu , Yang He , Yanwen Cai , Byeongguk Kwak , Yuepeng Wang

Nearest Neighbor Search over Vectorized Lexico-Syntactic Patterns for Relation Extraction from Financial Documents

Relation extraction (RE) has achieved remarkable progress with the help of pre-trained language models. However, existing RE models are usually incapable of handling two situations: implicit expressions and long-tail relation classes,…

Computation and Language · Computer Science 2023-10-30 Pawan Kumar Rajpoot , Ankur Parikh

Chemical Names Standardization using Neural Sequence to Sequence Model

Chemical information extraction is to convert chemical knowledge in text into true chemical database, which is a text processing task heavily relying on chemical compound name identification and standardization. Once a systematic name for a…

Computation and Language · Computer Science 2019-01-23 Junlang Zhan , Hai Zhao

FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs

Synthetic data generation has emerged as an invaluable solution in scenarios where real-world data collection and usage are limited by cost and scarcity. Large language models (LLMs) have demonstrated remarkable capabilities in producing…

Machine Learning · Computer Science 2025-07-22 Anh Nguyen , Sam Schafft , Nicholas Hale , John Alfaro

Mapping and Classifying Molecules from a High-Throughput Structural Database

High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from…

Chemical Physics · Physics 2016-11-22 Sandip De , Felix Musil , Teresa Ingram , Carsten Baldauf , Michele Ceriotti