Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model

Jing Li; Qiu-Feng Wang; Siyuan Wang; Rui Zhang; Kaizhu Huang; Erik Cambria

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model

Computer Vision and Pattern Recognition 2024-07-09 v2

Authors: Jing Li , Qiu-Feng Wang , Siyuan Wang , Rui Zhang , Kaizhu Huang , Erik Cambria

Abstract

Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel approach based on diffusion models to generate a diverse range of controllable oracle characters. Unlike traditional diffusion models that operate primarily on text prompts, Diff-Oracle incorporates a style encoder that utilizes style reference images to control the generation style. This encoder extracts style prompts from existing oracle character images, where style details are converted into a text embedding format via a pretrained language-vision model. On the other hand, a content encoder is integrated within Diff-Oracle to capture specific content details from content reference images, ensuring that the generated characters accurately represent the intended glyphs. To effectively train Diff-Oracle, we pre-generate pixel-level paired oracle character images (i.e., style and content images) by an image-to-image translation model. Extensive qualitative and quantitative experiments are conducted on datasets Oracle-241 and OBC306. While significantly surpassing present generative methods in terms of image generation, Diff-Oracle substantially benefits downstream oracle character recognition, outperforming all existing SOTAs by a large margin. In particular, on the challenging OBC306 dataset, Diff-Oracle leads to an accuracy gain of 7.70% in the zero-shot setting and is able to recognize unseen oracle character images with the accuracy of 84.62%, achieving a new benchmark for deciphering oracle bone scripts.

Keywords

diffusion model handwritten character recognition

Cite

@article{arxiv.2312.13631,
  title  = {Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model},
  author = {Jing Li and Qiu-Feng Wang and Siyuan Wang and Rui Zhang and Kaizhu Huang and Erik Cambria},
  journal= {arXiv preprint arXiv:2312.13631},
  year   = {2024}
}

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model

Abstract

Keywords

Cite

Related papers