Yu Lin
Building competitive automatic speech recognition (ASR) models usually requires large-scale au- dio supervision, which makes reproduction and specialization expensive. We study Ark-ASR, a 0.6B- parameter audio-conditioned language model…
Current personalization methods for generative vision models typically encode new concepts through continuous adapters or weight updates, yet provide limited control over whether and when a concept should be retrieved. In this work, we…
Reducing the annotation cost of oriented object detection in remote sensing remains a major challenge. Recently, sparse annotation has gained attention for effectively reducing annotation redundancy in densely remote sensing scenes.…
Lattice compression has emerged as a fundamental tuning parameter for nickelate superconductivity. Pressure acts as a trigger to induce superconductivity in bulk Ruddlesden-Popper nickelates. For infinite-layer nickelate thin films,…
Feature selection, dimension selection, and embedding compression are fundamental techniques for improving efficiency and generalization in deep recommender systems. Although conceptually related, these problems are typically studied in…
The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in…
With the rapid rise of intelligent data services, modern enterprises increasingly require efficient, multimodal, and cost-effective data analytics infrastructures. However, in ByteDance's production environments, existing systems fall short…
Traditional object detection systems are typically constrained to predefined categories, limiting their applicability in dynamic environments. In contrast, open-vocabulary object detection (OVD) enables the identification of objects from…
Traditional speech systems typically rely on separate, task-specific models for text-to-speech (TTS), automatic speech recognition (ASR), and voice conversion (VC), resulting in fragmented pipelines that limit scalability, efficiency, and…
Drug recommendation (DR) systems aim to support healthcare professionals in selecting appropriate medications based on patients' medical conditions. State-of-the-art approaches utilize deep learning techniques for improving DR, but fall…
This paper aims to derive explicit and computable error bounds for the asymptotic expansion of the Jacobi polynomials as their degree approaches infinity, using an integral method. The analysis focuses on the outer or oscillatory region of…
Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised…
Representing 3D scenes from multiview images is a core challenge in computer vision and graphics, which requires both precise rendering and accurate reconstruction. Recently, 3D Gaussian Splatting (3DGS) has garnered significant attention…
Efficiently synthesizing novel views from sparse inputs while maintaining accuracy remains a critical challenge in 3D reconstruction. While advanced techniques like radiance fields and 3D Gaussian Splatting achieve rendering quality and…
In recent years, it has become increasingly clear that space weather disturbances can be triggered by transient upstream mesoscale structures (TUMS), independently of the occurrence of large-scale solar wind (SW) structures, such as…
This paper addresses the problem of vision-based pedestrian localization, which estimates a pedestrian's location using images and camera parameters. In practice, however, calibrated camera parameters often deviate from the ground truth,…
Drawing inspiration from the achievements of natural language processing, we adopt self-supervised learning and utilize an equivariant graph neural network to develop a unified platform designed for training generative models capable of…
Simulations have played a critical role in the advancement of our knowledge of magnetic reconnection. However, due to the inherently multiscale nature of reconnection, it is impossible to simulate all physics at all scales. For this reason,…
Although transition-metal nitrides have been widely applied for several decades, experimental investigations of their high-resolution electronic band structures are rare due to the lack of high-quality single-crystalline samples. Here, we…
Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for distinct medical…