Related papers: Intention-based Segmentation: Human Reliability an…
Understanding human instructions to identify the target objects is vital for perception systems. In recent years, the advancements of Large Language Models (LLMs) have introduced new possibilities for image segmentation. In this work, we…
We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. Leveraging recent progress in object recognition and natural language semantics, we show…
For real-life applications, it is crucial that end-to-end spoken language translation models perform well on continuous audio, without relying on human-supplied segmentation. For online spoken language translation, where models need to…
Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective. Along with these requirements, agents are…
Dialogue disentanglement aims to detach the chronologically ordered utterances into several independent sessions. Conversation utterances are essentially organized and described by the underlying discourse, and thus dialogue disentanglement…
A statistical model for segmentation and word discovery in child directed speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described and results of empirical tests showing…
Structured sentences are important expressions in human writings and dialogues. Previous works on neural text generation fused semantic and structural information by encoding the entire sentence into a mixed hidden representation. However,…
Conversational systems have become increasingly popular as a way for humans to interact with computers. To be able to provide intelligent responses, conversational systems must correctly model the structure and semantics of a conversation.…
In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in…
Neural semantic parsers usually fail to parse long and complex utterances into correct meaning representations, due to the lack of exploiting the principle of compositionality. To address this issue, we present a novel framework for…
We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing…
Understanding how individuals perceive and recall information in their natural environments is critical to understanding potential failures in perception (e.g., sensory loss) and memory (e.g., dementia). Event segmentation, the process of…
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the…
Providing emotional support through dialogue systems is becoming increasingly important in today's world, as it can support both mental health and social interactions in many conversation scenarios. Previous works have shown that using…
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human…
Context: User intent modeling is a crucial process in Natural Language Processing that aims to identify the underlying purpose behind a user's request, enabling personalized responses. With a vast array of approaches introduced in the…
Reasoning segmentation seeks pixel-accurate masks for targets referenced by complex, often implicit instructions, requiring context-dependent reasoning over the scene. Recent multimodal language models have advanced instruction following…
A combination of a neural network with rule firing information from a rule-based system is used to generate segment durations for a text-to-speech system. The system shows a slight improvement in performance over a neural network system…
Discourse structure is integral to understanding a text and is helpful in many NLP tasks. Learning latent representations of discourse is an attractive alternative to acquiring expensive labeled discourse data. Liu and Lapata (2018) propose…
With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet,…