Related papers: Character-Aware Models Improve Visual Text Renderi…

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder

Recently, significant advancements have been made in diffusion-based visual text generation models. Although the effectiveness of these methods in visual text rendering is rapidly improving, they still encounter challenges such as…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Lichen Ma , Tiezhu Yue , Pei Fu , Yujie Zhong , Kai Zhou , Xiaoming Wei , Jie Hu

Toucan: Token-Aware Character Level Language Modeling

Character-level language models obviate the need for separately trained tokenizers, but efficiency suffers from longer sequence lengths. Learning to combine character representations into tokens has made training these models more…

Computation and Language · Computer Science 2023-11-16 William Fleshman , Benjamin Van Durme

Text Rendering Strategies for Pixel Language Models

Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large…

Computation and Language · Computer Science 2023-11-02 Jonas F. Lotz , Elizabeth Salesky , Phillip Rust , Desmond Elliott

Character Region Awareness for Text Detection

Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Youngmin Baek , Bado Lee , Dongyoon Han , Sangdoo Yun , Hwalsuk Lee

Learning Character-level Compositionality with Visual Features

Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words. However, in many writing systems compositionality has an effect even on the…

Computation and Language · Computer Science 2017-05-09 Frederick Liu , Han Lu , Chieh Lo , Graham Neubig

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation

Recent advances in image tokenizers, such as VQ-VAE, have enabled text-to-image generation using auto-regressive methods, similar to language modeling. However, these methods have yet to leverage pre-trained language models, despite their…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Yuhui Zhang , Brandon McKinzie , Zhe Gan , Vaishaal Shankar , Alexander Toshev

Adaptive Text Recognition through Visual Matching

In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Chuhan Zhang , Ankush Gupta , Andrew Zisserman

RepText: Rendering Visual Text via Replicating

Although contemporary text-to-image generation models have achieved remarkable breakthroughs in producing visually appealing images, their capacity to generate precise and flexible typographic elements, especially non-Latin alphabets,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-29 Haofan Wang , Yujia Xu , Yimeng Li , Junchen Li , Chaowei Zhang , Jing Wang , Kejia Yang , Zhibo Chen

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains. While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Oran Gafni , Adam Polyak , Oron Ashual , Shelly Sheynin , Devi Parikh , Yaniv Taigman

Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity

We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs. On the datasets of two recent generation…

Computation and Language · Computer Science 2018-10-12 Glorianna Jagfeld , Sabrina Jenne , Ngoc Thang Vu

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion models have gained significant attention. However, vanilla diffusion models often suffer from spelling inaccuracies in the text displayed within the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Sanyam Lakhanpal , Shivang Chopra , Vinija Jain , Aman Chadha , Man Luo

GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions.Although the synthesis performance is…

Computer Vision and Pattern Recognition · Computer Science 2023-05-24 Jian Ma , Mingjun Zhao , Chen Chen , Ruichen Wang , Di Niu , Haonan Lu , Xiaodong Lin

Character-Centric Story Visualization via Visual Planning and Token Alignment

Story visualization advances the traditional text-to-image generation by enabling multiple image generation based on a complete story. This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Hong Chen , Rujun Han , Te-Lin Wu , Hideki Nakayama , Nanyun Peng

Towards Open-Set Text Recognition via Label-to-Prototype Learning

Scene text recognition is a popular topic and extensively used in the industry. Although many methods have achieved satisfactory performance for the close-set text recognition challenges, these methods lose feasibility in open-set…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Chang Liu , Chun Yang , Hai-Bo Qin , Xiaobin Zhu , Cheng-Lin Liu , Xu-Cheng Yin

Towards the Unseen: Iterative Text Recognition by Distilling from Errors

Visual text recognition is undoubtedly one of the most extensively researched topics in computer vision. Great progress have been made to date, with the latest models starting to focus on the more practical "in-the-wild" setting. However, a…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ayan Kumar Bhunia , Pinaki Nath Chowdhury , Aneeshan Sain , Yi-Zhe Song

Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive…

Computation and Language · Computer Science 2017-07-21 Zhenisbek Assylbekov , Rustem Takhanov , Bagdat Myrzakhmetov , Jonathan N. Washington

Character-based Neural Machine Translation

We introduce a neural machine translation model that views the input and output sentences as sequences of characters rather than words. Since word-level information provides a crucial source of bias, our input model composes representations…

Computation and Language · Computer Science 2015-11-17 Wang Ling , Isabel Trancoso , Chris Dyer , Alan W Black

Visually grounded learning of keyword prediction from untranscribed speech

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful…

Computation and Language · Computer Science 2017-05-29 Herman Kamper , Shane Settle , Gregory Shakhnarovich , Karen Livescu

Enhancing Vision Models for Text-Heavy Content Understanding and Interaction

Interacting and understanding with text heavy visual content with multiple images is a major challenge for traditional vision models. This paper is on enhancing vision models' capability to comprehend or understand and learn from images…

Computer Vision and Pattern Recognition · Computer Science 2024-08-31 Adithya TG , Adithya SK , Abhinav R Bharadwaj , Abhiram HA , Surabhi Narayan

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Omri Avrahami , Amir Hertz , Yael Vinker , Moab Arar , Shlomi Fruchter , Ohad Fried , Daniel Cohen-Or , Dani Lischinski