Related papers: Improve Language Modelling for Code Completion thr…

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Replacing Language Model for Style Transfer

We introduce replacing language model (RLM), a sequence-to-sequence language modeling framework for text style transfer (TST). Our method autoregressively replaces each token of the source sentence with a text span that has a similar…

Computation and Language · Computer Science 2024-02-29 Pengyu Cheng , Ruineng Li

BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings

Sentence embeddings are crucial in measuring semantic similarity. Most recent studies employed large language models (LLMs) to learn sentence embeddings. Existing LLMs mainly adopted autoregressive architecture without explicit backward…

Computation and Language · Computer Science 2024-03-15 Xianming Li , Jing Li

LLM-FSM: Scaling Large Language Models for Finite-State Reasoning in RTL Code Generation

Finite-state reasoning, the ability to understand and implement state-dependent behavior, is central to hardware design. In this paper, we present LLM-FSM, a benchmark that evaluates how well large language models (LLMs) can recover…

Artificial Intelligence · Computer Science 2026-02-10 Yuheng Wu , Berk Gokmen , Zhouhua Xie , Peijing Li , Caroline Trippel , Priyanka Raina , Thierry Tambe

Generative Spoken Language Model based on continuous word-sized audio tokens

In NLP, text language models based on words or subwords are known to outperform their character-based counterparts. Yet, in the speech community, the standard input of spoken LMs are 20ms or 40ms-long discrete units (shorter than a…

Computation and Language · Computer Science 2023-10-10 Robin Algayres , Yossi Adi , Tu Anh Nguyen , Jade Copet , Gabriel Synnaeve , Benoit Sagot , Emmanuel Dupoux

ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent

Large Language Models (LLMs) have demonstrated impressive capabilities in language generation and general task performance. However, their application to spoken language understanding (SLU) remains challenging, particularly for token-level…

Computation and Language · Computer Science 2025-10-09 Shangjian Yin , Peijie Huang , Jiatian Chen , Haojing Huang , Yuhong Xu

Scaling Sentence Embeddings with Large Language Models

Large language models (LLMs) have recently garnered significant interest. With in-context learning, LLMs achieve impressive results in various natural language tasks. However, the application of LLMs to sentence embeddings remains an area…

Computation and Language · Computer Science 2023-08-01 Ting Jiang , Shaohan Huang , Zhongzhi Luan , Deqing Wang , Fuzhen Zhuang

Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

Language models (LMs) for text data have been studied extensively for their usefulness in language generation and other downstream tasks. However, language modelling purely in the speech domain is still a relatively unexplored topic, with…

Computation and Language · Computer Science 2021-11-02 Anurag Katakkar , Alan W Black

Recent Advances in Speech Language Models: A Survey

Large Language Models (LLMs) have recently garnered significant attention, primarily for their capabilities in text-based interactions. However, natural human interaction often relies on speech, necessitating a shift towards voice-based…

Computation and Language · Computer Science 2025-08-08 Wenqian Cui , Dianzhi Yu , Xiaoqi Jiao , Ziqiao Meng , Guangyan Zhang , Qichao Wang , Yiwen Guo , Irwin King

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large…

Computation and Language · Computer Science 2024-05-16 Bowen Zhang , Kehua Chang , Chunping Li

SLM-SS: Speech Language Model for Generative Speech Separation

Speech separation (SS) has advanced significantly with neural network-based methods, showing improved performance on signal-level metrics. However, these methods often struggle to maintain speech intelligibility in the separated signals,…

Sound · Computer Science 2026-01-28 Tianhua Li , Chenda Li , Wei Wang , Xin Zhou , Xihui Chen , Jianqing Gao , Yanmin Qian

Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves

We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence.…

Machine Learning · Computer Science 2016-04-11 Fei Tian , Bin Gao , Di He , Tie-Yan Liu

Bridging What the Model Thinks and How It Speaks: Self-Aware Speech Language Models for Expressive Speech Generation

Speech Language Models (SLMs) exhibit strong semantic understanding, yet their generated speech often sounds flat and fails to convey expressive intent, undermining user engagement. We term this mismatch the semantic understanding-acoustic…

Computation and Language · Computer Science 2026-04-14 Kuang Wang , Lai Wei , Qibing Bai , Ping Lin , Wenkai Fang , Feng Jiang , Zhongjie Jiang , Jun Huang , Yannan Wang , Haizhou Li

Improving the Robustness to Data Inconsistency between Training and Testing for Code Completion by Hierarchical Language Model

In the field of software engineering, applying language models to the token sequence of source code is the state-of-art approach to build a code recommendation system. The syntax tree of source code has hierarchical structures. Ignoring the…

Software Engineering · Computer Science 2022-11-29 Yixiao Yang

SatLM: Satisfiability-Aided Language Models Using Declarative Prompting

Prior work has combined chain-of-thought prompting in large language models (LLMs) with programmatic representations to perform effective and transparent reasoning. While such an approach works well for tasks that only require forward…

Computation and Language · Computer Science 2023-10-13 Xi Ye , Qiaochu Chen , Isil Dillig , Greg Durrett

SignLLM: Sign Language Production Large Language Models

In this paper, we propose SignLLM, a multilingual Sign Language Production (SLP) large language model, which includes two novel multilingual SLP modes MLSF and Prompt2LangGloss that allow sign language gestures generation from query texts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Sen Fang , Chen Chen , Lei Wang , Ce Zheng , Chunyu Sui , Yapeng Tian

Encoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken Language Understanding

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the…

Computation and Language · Computer Science 2017-03-14 Su Zhu , Kai Yu

Structural Language Models of Code

We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the…

Machine Learning · Computer Science 2020-07-30 Uri Alon , Roy Sadaka , Omer Levy , Eran Yahav

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling

Recent efforts target spoken language models (SLMs) that not only listen but also speak for more natural human-LLM interaction. Joint speech-text modeling is a promising direction to achieve this. However, the effectiveness of recent speech…

Computation and Language · Computer Science 2026-02-06 Liang-Hsuan Tseng , Yi-Chang Chen , Kuan-Yi Lee , Da-Shan Shiu , Hung-yi Lee

Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models

A variety of contextualised language models have been proposed in the NLP community, which are trained on diverse corpora to produce numerous Neural Language Models (NLMs). However, different NLMs have reported different levels of…

Computation and Language · Computer Science 2022-04-19 Keigo Takahashi , Danushka Bollegala