Related papers: Intention-based Segmentation: Human Reliability an…
Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction…
Discourse relations bind smaller linguistic elements into coherent texts. However, automatically identifying discourse relations is difficult, because it requires understanding the semantics of the linked sentences. A more subtle challenge…
A key challenge of dialog systems research is to effectively and efficiently adapt to new domains. A scalable paradigm for adaptation necessitates the development of generalizable models that perform well in few-shot settings. In this…
Dividing oral histories into topically coherent segments can make them more accessible online. People regularly make judgments about where coherent segments can be extracted from oral histories. But making these judgments can be taxing, so…
Although perception systems have made remarkable advancements in recent years, they still rely on explicit human instruction or pre-defined categories to identify the target objects before executing visual recognition tasks. Such systems…
This paper primarily demonstrates a method to quantitatively assess the alignment between multi-step, structured reasoning in large language models and human preferences. We introduce the Alignment Score, a semantic-level metric that…
The abstract of a scientific paper distills the contents of the paper into a short paragraph. In the biomedical literature, it is customary to structure an abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT, and…
Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly…
We here explore a ``fully'' lexicalized Tree-Adjoining Grammar for discourse that takes the basic elements of a (monologic) discourse to be not simply clauses, but larger structures that are anchored on variously realized discourse cues.…
A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language…
Discourse coherence is strongly associated with text quality, making it important to natural language generation and understanding. Yet existing models of coherence focus on measuring individual aspects of coherence (lexical overlap,…
Semantic segmentation has become an important task in computer vision with the growth of self-driving cars, medical image segmentation, etc. Although current models provide excellent results, they are still far from perfect and while there…
The reasoning segmentation task, which demands a nuanced comprehension of intricate queries to accurately pinpoint object regions, is attracting increasing attention. However, Multi-modal Large Language Models (MLLM) often find it difficult…
Generative AI models face the challenge of hallucinations that can undermine users' trust in such systems. We approach the problem of conversational information seeking as a two-step process, where relevant passages in a corpus are…
In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system…
Tangled multi-party dialogue contexts lead to challenges for dialogue reading comprehension, where multiple dialogue threads flow simultaneously within a common dialogue record, increasing difficulties in understanding the dialogue history…
This study re-evaluates the assumption that long-range correlations in sentence length are a fundamental feature of natural language and a marker of literary style. While previous research has suggested that punctuation marks--particularly…
Temporary syntactic ambiguities arise when the beginning of a sentence is compatible with multiple syntactic analyses. We inspect to which extent neural language models (LMs) exhibit uncertainty over such analyses when processing…
Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic…