Related papers: Positional Description for Numerical Normalization

Positional Description Matters for Transformers Arithmetic

Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is…

Computation and Language · Computer Science 2023-11-28 Ruoqi Shen , Sébastien Bubeck , Ronen Eldan , Yin Tat Lee , Yuanzhi Li , Yi Zhang

On the Geometry of Positional Encodings in Transformers

Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance,…

Machine Learning · Computer Science 2026-04-08 Giansalvo Cirrincione

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of…

Computation and Language · Computer Science 2021-04-19 Shubhi Tyagi , Antonio Bonafonte , Jaime Lorenzo-Trueba , Javier Latorre

A Triadic Suffix Tokenization Scheme for Numerical Reasoning

Standard subword tokenization methods fragment numbers inconsistently, causing large language models (LLMs) to lose positional and decimal structure - a primary driver of errors in arithmetic and scientific reasoning. We introduce Triadic…

Computation and Language · Computer Science 2026-04-22 Olga Chetverina

TSDS: Data Selection for Task-Specific Model Finetuning

Finetuning foundation models for specific tasks is an emerging paradigm in modern machine learning. The efficacy of task-specific finetuning largely depends on the selection of appropriate training data. We present TSDS (Task-Specific Data…

Machine Learning · Computer Science 2024-12-30 Zifan Liu , Amin Karbasi , Theodoros Rekatsinas

A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions

Neural networks are one tool for approximating non-linear differential equations used in scientific computing tasks such as surrogate modeling, real-time predictions, and optimal control. PDE foundation models utilize neural networks to…

Machine Learning · Computer Science 2025-02-11 Elisa Negrini , Yuxuan Liu , Liu Yang , Stanley J. Osher , Hayden Schaeffer

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning…

Artificial Intelligence · Computer Science 2026-02-20 Sumedh Rasal

A Context-Based Numerical Format Prediction for a Text-To-Speech System

Many of the existing TTS systems cannot accurately synthesize text containing a variety of numerical formats, resulting in reduced intelligibility of the synthesized speech. This research aims to develop a numerical format classifier that…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-03 Yaser Darwesh , Lit Wei Wern , Mumtaz Begum Mustafa

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little…

Computation and Language · Computer Science 2018-09-06 Daniel Watson , Nasser Zalmout , Nizar Habash

Improved Language Modeling by Decoding the Past

Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next…

Computation and Language · Computer Science 2019-01-25 Siddhartha Brahma

PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech

Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering…

Computation and Language · Computer Science 2025-11-06 Michel Wong , Ali Alshehri , Sophia Kao , Haotian He

Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control

Learning a stable Linear Dynamical System (LDS) from data involves creating models that both minimize reconstruction error and enforce stability of the learned representation. We propose a novel algorithm for learning stable LDSs. Using a…

Machine Learning · Computer Science 2020-11-19 Giorgos Mamakoukas , Orest Xherija , T. D. Murphey

Duality Diagram Similarity: a generic framework for initialization selection in task transfer learning

In this paper, we tackle an open research question in transfer learning, which is selecting a model initialization to achieve high performance on a new task, given several pre-trained models. We propose a new highly efficient and accurate…

Computer Vision and Pattern Recognition · Computer Science 2020-08-06 Kshitij Dwivedi , Jiahui Huang , Radoslaw Martin Cichy , Gemma Roig

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

We propose a novel text-to-speech (TTS) framework centered around a neural transducer. Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence (seq2seq) modeling and fine-grained acoustic modeling stages,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-28 Minchan Kim , Myeonghun Jeong , Byoung Jin Choi , Semin Kim , Joun Yeop Lee , Nam Soo Kim

The Locality and Symmetry of Positional Encodings

Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not…

Computation and Language · Computer Science 2023-10-20 Lihu Chen , Gaël Varoquaux , Fabian M. Suchanek

Neural Processes with Stochastic Attention: Paying more attention to the context dataset

Neural processes (NPs) aim to stochastically complete unseen data points based on a given context dataset. NPs essentially leverage a given dataset as a context representation to derive a suitable identifier for a novel task. To improve the…

Machine Learning · Computer Science 2022-04-13 Mingyu Kim , Kyeongryeol Go , Se-Young Yun

Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning

Recent research has explored how Language Models (LMs) can be used for feature representation and prediction in tabular machine learning tasks. This involves employing text serialization and supervised fine-tuning (SFT) techniques. Despite…

Computation and Language · Computer Science 2024-06-21 Kyoka Ono , Simon A. Lee

Encoding word order in complex embeddings

Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words,…

Computation and Language · Computer Science 2020-06-30 Benyou Wang , Donghao Zhao , Christina Lioma , Qiuchi Li , Peng Zhang , Jakob Grue Simonsen

Preconditioning and Numerical Stability in Neural Network Training for Parametric PDEs

In the context of training neural network-based approximations of solutions of parameter-dependent PDEs, we investigate the effect of preconditioning via well-conditioned frame representations of operators and demonstrate a significant…

Numerical Analysis · Mathematics 2026-02-02 Markus Bachmayr , Wolfgang Dahmen , Chenguang Duan , Mathias Oster

Extracting Text Representations for Terms and Phrases in Technical Domains

Extracting dense representations for terms and phrases is a task of great importance for knowledge discovery platforms targeting highly-technical fields. Dense representations are used as features for downstream components and have multiple…

Computation and Language · Computer Science 2023-05-26 Francesco Fusco , Diego Antognini