数据库
The aim of this paper is to examine and demonstrate how integer-based datetime labels (integer surrogate keys for time) can optimize data-warehouse and time-series performance, proposing practical formats and algorithms and validating their…
Ranging from batch scripts to computational notebooks, modern data science tools rely on massive and evolving object graphs that represent structured data, models, plots, and more. Persisting these objects is critical, not only to enhance…
Strong isolation guarantees, such as serializability and snapshot isolation, are essential for maintaining data consistency and integrity in modern databases. Verifying whether a database upholds its claimed guarantees is increasingly…
SQL-to-Text generation aims at translating structured SQL queries into natural language descriptions, thereby facilitating comprehension of complex database operations for non-technical users. Although large language models (LLMs) have…
Snowpark enables Data Engineering and AI/ML workloads to run directly within Snowflake by deploying a secure sandbox on virtual warehouse nodes. This Snowpark Execution Environment (SEE) allows users to execute arbitrary workloads in Python…
Physics-based simulators play a critical role in scientific discovery and risk assessment, enabling what-if analyses for events like wildfires and hurricanes. Today, databases treat these simulators as external pre-processing steps.…
Datalog-based languages are regaining popularity as a powerful abstraction for expressing recursive computations in domains such as program analysis and graph processing. However, existing systems often face a trade-off between efficiency…
Large Language Models (LLMs) promise to automate data engineering on tabular data, offering enterprises a valuable opportunity to cut the high costs of manual data handling. But the enterprise domain comes with unique challenges that…
Existing database benchmarks primarily focus on performance under ideal running environments. However, in real-world scenarios, databases probably face numerous adverse events. Quantifying the ability to cope with these events from a…
NOMAD (Navigating Optimal Model Application for Datastreams) is an intelligent framework for data enrichment during ingestion that optimizes realtime multiclass classification by dynamically constructing model chains, i.e ,sequences of…
The graph database (GDB) is an increasingly common storage model for data involving relationships between entries. Beyond its widespread usage in database industries, the advantages of GDBs indicate a strong potential in constructing…
Large Language Models (LLMs) show remarkable potential for urban computing, from spatial reasoning to predictive analytics. However, evaluating LLMs across diverse urban tasks faces two critical challenges: lack of unified platforms for…
Novel reactive moving object applications require solutions to support object reactive behaviors as a way to query and update dynamic data. While moving object scenarios have long been researched in the context of spatio-temporal data…
Recent advances in Text-to-SQL have achieved strong results in static, single-turn tasks, where models generate SQL queries from natural language questions. However, these systems fall short in real-world interactive scenarios, where user…
We propose BS-tree, an in-memory implementation of the B+-tree that adopts the structure of the disk-based index (i.e., a balanced, multiway tree), setting the node size to a memory block that can be processed fast and in parallel using…
Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable…
As organizations continue to access diverse datasets, the demand for effective data integration has increased. Key tasks in this process, such as schema matching and entity resolution, are essential but often require significant effort.…
Key-value stores are a fundamental class of NoSQL databases that offer a simple yet powerful model for data storage and retrieval, representing information as pairs of unique keys and associated values. Their minimal structure enables…
Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These…
Graphs are a ubiquitous data structure in diverse domains such as machine learning, social networks, and data mining. As real-world graphs continue to grow beyond the memory capacity of single machines, out-of-core graph processing systems…