Related papers: Thoth: Improved Rapid Serial Visual Presentation u…

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that…

Computation and Language · Computer Science 2026-03-03 Jiafeng Lin , Yuxuan Wang , Jialong Wu , Huakun Luo , Zhongyi Pei , Jianmin Wang

Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for…

Computation and Language · Computer Science 2024-05-27 Chenxi Sun , Hongzhi Zhang , Zijia Lin , Jingyuan Zhang , Fuzheng Zhang , Zhongyuan Wang , Bin Chen , Chengru Song , Di Zhang , Kun Gai , Deyi Xiong

Thread of Thought Unraveling Chaotic Contexts

Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with…

Computation and Language · Computer Science 2023-11-16 Yucheng Zhou , Xiubo Geng , Tao Shen , Chongyang Tao , Guodong Long , Jian-Guang Lou , Jianbing Shen

Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models. This work enables the TOD…

Computation and Language · Computer Science 2023-08-17 Jianguo Zhang , Stephen Roller , Kun Qian , Zhiwei Liu , Rui Meng , Shelby Heinecke , Huan Wang , Silvio Savarese , Caiming Xiong

Text to speech synthesis

Text-to-speech (TTS) synthesis is a technology that converts written text into spoken words, enabling a natural and accessible means of communication. This abstract explores the key aspects of TTS synthesis, encompassing its underlying…

Software Engineering · Computer Science 2024-01-26 Harini s , Manoj G M

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Current image generation and editing methods primarily process textual prompts as direct inputs without reasoning about visual composition and explicit operations. We present Generation Chain-of-Thought (GoT), a novel paradigm that enables…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Rongyao Fang , Chengqi Duan , Kun Wang , Linjiang Huang , Hao Li , Shilin Yan , Hao Tian , Xingyu Zeng , Rui Zhao , Jifeng Dai , Xihui Liu , Hongsheng Li

ARTH: Algorithm For Reading Text Handily -- An AI Aid for People having Word Processing Issues

The objective of this project is to solve one of the major problems faced by the people having word processing issues like trauma, or mild mental disability. "ARTH" is the short form of Algorithm for Reading Handily. ARTH is a self-learning…

Computation and Language · Computer Science 2021-01-26 Akanksha Malhotra , Sudhir Kamle

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

With the development of Natural Language Processing (NLP), more and more systems want to adopt NLP in User Interface Module to process user input, in order to communicate with user in a natural way. However, this raises a speed problem.…

Artificial Intelligence · Computer Science 2009-09-30 Zhe Chen , Dunwei Wen

A review-based study on different Text-to-Speech technologies

This research paper presents a comprehensive review-based study on various Text-to-Speech (TTS) technologies. TTS technology is an important aspect of human-computer interaction, enabling machines to convert written text into audible…

Sound · Computer Science 2023-12-20 Md. Jalal Uddin Chowdhury , Ashab Hussan

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we…

Sound · Computer Science 2022-02-07 Dabiao Ma , Yitong Zhang , Meng Li , Feng Ye

Robustly Optimized and Distilled Training for Natural Language Understanding

In this paper, we explore multi-task learning (MTL) as a second pretraining step to learn enhanced universal language representation for transformer language models. We use the MTL enhanced representation across several natural language…

Computation and Language · Computer Science 2021-03-17 Haytham ElFadeel , Stan Peshterliev

FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text

FASTUS is a system for extracting information from natural language text for entry into a database and for other applications. It works essentially as a cascaded, nondeterministic finite-state automaton. There are five stages in the…

cmp-lg · Computer Science 2008-02-03 Jerry R. Hobbs , Douglas Appelt , John Bear , David Israel , Megumi Kameyama , Mark Stickel , Mabry Tyson

Reading Text in the Wild with Convolutional Neural Networks

In this work we present an end-to-end system for text spotting -- localising and recognising text in natural scene images -- and text based image retrieval. This system is based on a region proposal mechanism for detection and deep…

Computer Vision and Pattern Recognition · Computer Science 2014-12-08 Max Jaderberg , Karen Simonyan , Andrea Vedaldi , Andrew Zisserman

Analyzing Large Collections of Electronic Text Using OLAP

Computer-assisted reading and analysis of text has various applications in the humanities and social sciences. The increasing size of many electronic text archives has the advantage of a more complete analysis but the disadvantage of taking…

Databases · Computer Science 2007-05-23 Steven Keith , Owen Kaser , Daniel Lemire

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has…

Artificial Intelligence · Computer Science 2026-02-11 Qikai Chang , Zhenrong Zhang , Pengfei Hu , Jun Du , Jiefeng Ma , Yicheng Pan , Jianshu Zhang , Quan Liu , Jianqing Gao

VITRO: Vocabulary Inversion for Time-series Representation Optimization

Although LLMs have demonstrated remarkable capabilities in processing and generating textual data, their pre-trained vocabularies are ill-suited for capturing the nuanced temporal dynamics and patterns inherent in time series. The discrete,…

Machine Learning · Computer Science 2024-12-25 Filippos Bellos , Nam H. Nguyen , Jason J. Corso

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

Chain-of-Thought (CoT) prompting has proven highly effective for enhancing complex reasoning in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Yet, it struggles in complex spatial reasoning tasks. Nonetheless,…

Computation and Language · Computer Science 2025-01-14 Chengzu Li , Wenshan Wu , Huanyu Zhang , Yan Xia , Shaoguang Mao , Li Dong , Ivan Vulić , Furu Wei

A Survey on Neural Speech Synthesis

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-26 Xu Tan , Tao Qin , Frank Soong , Tie-Yan Liu

TypeShift: A User Interface for Visualizing the Typing Production Process

TypeShift is a tool for visualizing linguistic patterns in the timing of typing production. Language production is a complex process which draws on linguistic, cognitive and motor skills. By visualizing holistic trends in the typing…

Computation and Language · Computer Science 2021-03-09 Adam Goodkind

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Recent works on accelerating Vision-Language Models achieve strong performance across a variety of vision-language tasks despite highly compressing visual information. In this work, we examine the popular acceleration approach of early…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Mark Endo , Xiaohan Wang , Serena Yeung-Levy