Computer Science
Modern table formats such as Apache Iceberg compute and store metadata-commit timestamps, record counts, and column-level statistics such as null counts and value bounds at write time as part of file writing. These statistics serve query…
In this work, we present a compact surrogate circuit for electro-quasi-static (EQS) head modeling. A three-shell geometry (brain, skull, scalp) is considered, and each layer is modeled through radial and tangential pathways, implemented as…
Geo-distributed OLTP databases are widely deployed across cloud regions, yet current evaluation practices do not cover the challenges of this aspect. Existing benchmarks assume stable network conditions; they lack explicit settings for data…
Accurate modeling of electric potential and current distribution in head tissues is crucial for the design and evaluation of neuro-sensing and neuro-stimulation systems operating in the sub megahertz frequency range. Numerical methods are…
Text-to-Visualization (Text-to-Vis) translates natural language queries into visualization query languages, enabling non-expert users to perform data analysis. However, most existing methods follow a one-shot paradigm that requires users to…
Tokenized real-world assets (RWAs) are often evaluated through headline indicators such as total value locked (TVL) or on-chain asset value. However, a large asset base does not necessarily imply low risk, since tokenized assets may remain…
Rigid-bodied robots often lack compliance needed to adapt to unstructured environments, while fully soft robots, though highly adaptable, struggle with scalability and load capacity. In nature, musculoskeletal systems balance strength and…
As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data…
In cloud data platforms, developers often encounter performance regressions that occur in specific tenant datasets. However, due to confidentiality constraints, they cannot access the original data, which makes it difficult to reproduce…
Oracle Exadata consolidates thousands of tenant databases onto shared storage infrastructure deployed at hundreds of customer sites worldwide. Oracle Multitenant architecture enables this extreme density, with thousands of tenant databases…
This work presents an end-to-end strategy for solving inverse problems constrained by Partial Differential Equations within a fully differentiable Machine Learning framework. The proposed formulation provides a unified and user-friendly…
Compliance minimization is a central objective in structural topology optimization, commonly interpreted as the total strain energy of a system. In this work, we examine the influence of alternative compliance formulations based on…
Deploying Scientific Machine Learning surrogates in industrial CFD workflows requires adapting pretrained models to new vehicle families without large datasets; yet whether geometric representations learned by a geometry encoder transfer to…
3D volumetric reconstruction from incomplete or noisy measurements is a fundamental problem in medical imaging and computational tomography. Deep image prior (DIP)-based methods have recently shown strong capability for solving inverse…
AI agents are increasingly transacting on behalf of users -- delegating tasks, spending budgets, and negotiating with unfamiliar counterparties. Unlike human marketplaces, which operate under institutional designs refined over centuries,…
Approximate k-Nearest Neighbor (AKNN) search is widely used in vector databases. When vectors carry additional attributes (e.g., labels or numerical values), filtered AKNN search retrieves the nearest vectors to a query vector under…
Data transformation correctness is a fundamental challenge in data engineering: how can we verify that pipelines produce correct results before executing on production data? Existing practice relies on iterative testing over materialized…
Proteins inherently possess a consistent sequence-structure duality. The abundance of protein sequence data, which can be readily represented as discrete tokens, has driven fruitful developments in protein language models (pLMs). A key…
Workload traces from cloud data warehouse providers reveal that standard benchmarks such as TPC-H and TPC-DS fail to capture key characteristics of real-world workloads, including query repetition and string-heavy queries. In this paper, we…
As large language models (LLMs) increasingly engage in complex social interactions, ensuring that their behaviors align with human ethical principles and intentions, known as value alignment, has become a critical scientific challenge.…