Related papers: An Interpretable Deep Learning Approach for Morpho…
The paper presents a new script classification method for the discrimination of the South Slavic medieval labels. It consists in the textural analysis of the script types. In the first step, each letter is coded by the equivalent script…
Handwritten text recognition and optical character recognition solutions show excellent results with processing data of modern era, but efficiency drops with Latin documents of medieval times. This paper presents a deep learning method to…
We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a…
This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge,…
Analysis of scripts plays an important role in paleography and in quantitative linguistics. Especially in the field of digital paleography quantitative features are much needed to differentiate glyphs. We describe an elaborate set of…
Authorship identification tasks, which rely heavily on linguistic styles, have always been an important part of Natural Language Understanding (NLU) research. While other tasks based on linguistic style understanding benefit from deep…
In this paper we introduce a script identification method based on hand-crafted texture features and an artificial neural network. The proposed pipeline achieves near state-of-the-art performance for script identification of video-text and…
Arabic text recognition is a challenging task because of the cursive nature of Arabic writing system, its joint writing scheme, the large number of ligatures and many other challenges. Deep Learning DL models achieved significant progress…
A line of a bilingual document page may contain text words in regional language and numerals in English. For Optical Character Recognition (OCR) of such a document page, it is necessary to identify different script forms before running an…
Standardized corpora of undeciphered scripts, a necessary starting point for computational epigraphy, requires laborious human effort for their preparation from raw archaeological records. Automating this process through machine learning…
This paper presents a thoroughly automated method for identifying and interpreting cuneiform characters via advanced deep-learning algorithms. Five distinct deep-learning models were trained on a comprehensive dataset of cuneiform…
This article focuses on the transcription of medieval manuscripts. Whereas problems of transcription have long interested medievalists, few workable options in the era of printed editions were available besides normalisation. The automation…
The type used to print an early modern book can give scholars valuable information about the time and place of its production as well as its producer. Recognizing such type is currently done manually using both the character shapes of `M'…
Ancient scripts, e.g., Egyptian hieroglyphs, Oracle Bone Inscriptions, and Ancient Greek inscriptions, serve as vital carriers of human civilization, embedding invaluable historical and cultural information. Automating ancient script image…
Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this…
Recognizing fonts has become an important task in document analysis, due to the increasing number of available digital documents in different fonts and emphases. A generic font-recognition system independent of language, script and content…
We propose a deep factorization model for typographic analysis that disentangles content from style. Specifically, a variational inference procedure factors each training glyph into the combination of a character-specific content embedding…
Type inference methods based on deep learning are becoming increasingly popular as they aim to compensate for the drawbacks of static and dynamic analysis approaches, such as high uncertainty. However, their practical application is still…
Grouping has been commonly used in deep metric learning for computing diverse features. However, current methods are prone to overfitting and lack interpretability. In this work, we propose an improved and interpretable grouping method to…
In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method…