Related papers: An Iterative Algorithm to Build Chinese Language M…

State-of-the-art Chinese Word Segmentation with Bi-LSTMs

A wide variety of neural-network architectures have been proposed for the task of Chinese word segmentation. Surprisingly, we find that a bidirectional LSTM model, when combined with standard deep learning techniques and best practices, can…

Computation and Language · Computer Science 2018-08-27 Ji Ma , Kuzman Ganchev , David Weiss

Building Chinese Lexicons from Scratch by Unsupervised Short Document Self-Segmentation

Chinese text segmentation is a well-known and difficult problem. On one side, there is not a simple notion of "word" in Chinese language making really hard to implement rule-based systems to segment written texts, thus lexicons and…

Computation and Language · Computer Science 2007-05-23 Daniel Gayo-Avello

Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Multi-criteria Chinese word segmentation is a promising but challenging task, which exploits several different segmentation criteria and mines their common underlying knowledge. In this paper, we propose a flexible multi-criteria learning…

Computation and Language · Computer Science 2018-12-20 Jingjing Gong , Xinchi Chen , Tao Gui , Xipeng Qiu

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models

This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches. The key idea is to use an…

Computation and Language · Computer Science 2024-10-08 Houquan Zhou , Zhenghua Li , Bo Zhang , Chen Li , Shaopeng Lai , Ji Zhang , Fei Huang , Min Zhang

Binary Tree based Chinese Word Segmentation

Chinese word segmentation is a fundamental task for Chinese language processing. The granularity mismatch problem is the main cause of the errors. This paper showed that the binary tree representation can store outputs with different…

Computation and Language · Computer Science 2013-05-20 Kaixu Zhang , Can Wang , Maosong Sun

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

Chinese pre-trained language models usually process text as a sequence of characters, while ignoring more coarse granularity, e.g., words. In this work, we propose a novel pre-training paradigm for Chinese -- Lattice-BERT, which explicitly…

Computation and Language · Computer Science 2021-05-31 Yuxuan Lai , Yijia Liu , Yansong Feng , Songfang Huang , Dongyan Zhao

Joint Chinese Word Segmentation and Span-based Constituency Parsing

In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which…

Computation and Language · Computer Science 2022-12-01 Zhicheng Wang , Tianyu Shi , Cong Liu

Chinese NER Using Lattice LSTM

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and…

Computation and Language · Computer Science 2018-07-06 Yue Zhang , Jie Yang

LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction

Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task. Recent work using model ensemble methods based on voting can effectively mitigate over-correction and improve the precision of the GEC system.…

Computation and Language · Computer Science 2024-03-27 Yixuan Wang , Baoxin Wang , Yijun Liu , Dayong Wu , Wanxiang Che

Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling

Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while…

Computation and Language · Computer Science 2018-10-09 Zhiqing Sun , Zhi-Hong Deng

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

Pretrained language models (PLMs) have shown marvelous improvements across various NLP tasks. Most Chinese PLMs simply treat an input text as a sequence of characters, and completely ignore word information. Although Whole Word Masking can…

Computation and Language · Computer Science 2023-03-23 Xinnian Liang , Zefan Zhou , Hui Huang , Shuangzhi Wu , Tong Xiao , Muyun Yang , Zhoujun Li , Chao Bian

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Due to the characteristics of Chinese Language, Chinese spelling error detection is more challenging than error detection in…

Computation and Language · Computer Science 2019-11-26 Hao Wang , Bing Wang , Jianyong Duan , Jiajun Zhang

Robust Chinese Word Segmentation with Contextualized Word Representations

In recent years, after the neural-network-based method was proposed, the accuracy of the Chinese word segmentation task has made great progress. However, when dealing with out-of-vocabulary words, there is still a large error rate. We used…

Computation and Language · Computer Science 2019-01-18 Yung-Sung Chuang

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Chinese word segmentation and part-of-speech tagging are necessary tasks in terms of computational linguistics and application of natural language processing. Many re-searchers still debate the demand for Chinese word segmentation and…

Computation and Language · Computer Science 2021-12-20 Duc-Vu Nguyen , Linh-Bao Vo , Ngoc-Linh Tran , Kiet Van Nguyen , Ngan Luu-Thuy Nguyen

A Masked Segmental Language Model for Unsupervised Natural Language Segmentation

Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech…

Computation and Language · Computer Science 2021-09-07 C. M. Downey , Fei Xia , Gina-Anne Levow , Shane Steinert-Threlkeld

Chinese Lexical Simplification

Lexical simplification has attracted much attention in many languages, which is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. Although the richness of vocabulary in Chinese makes…

Computation and Language · Computer Science 2020-10-15 Jipeng Qiang , Xinyu Lu , Yun Li , Yunhao Yuan , Yang Shi , Xindong Wu

A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks

Most previous approaches to Chinese word segmentation can be roughly classified into character-based and word-based methods. The former regards this task as a sequence-labeling problem, while the latter directly segments character sequence…

Computation and Language · Computer Science 2017-12-29 Zhiqing Sun , Gehui Shen , Zhihong Deng

A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

The prevalent approaches of Chinese word segmentation task almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have some inherent drawbacks: hard to parallel computing, little efficient in applying the Dropout…

Computation and Language · Computer Science 2019-05-22 Wei Jiang , Yan Tang

An Iterative Polishing Framework based on Quality Aware Masked Language Model for Chinese Poetry Generation

Owing to its unique literal and aesthetical characteristics, automatic generation of Chinese poetry is still challenging in Artificial Intelligence, which can hardly be straightforwardly realized by end-to-end methods. In this paper, we…

Computation and Language · Computer Science 2019-12-02 Liming Deng , Jie Wang , Hangming Liang , Hui Chen , Zhiqiang Xie , Bojin Zhuang , Shaojun Wang , Jing Xiao

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Recently, the development and progress of Large Language Models (LLMs) have amazed the entire Artificial Intelligence community. Benefiting from their emergent abilities, LLMs have attracted more and more researchers to study their…

Computation and Language · Computer Science 2024-10-28 Yinghui Li , Haojing Huang , Shirong Ma , Yong Jiang , Yangning Li , Feng Zhou , Hai-Tao Zheng , Qingyu Zhou