Uwe Springmann — Scifaro

Open Source Handwritten Text Recognition on Medieval Manuscripts using Mixed Models and Document-Specific Finetuning

This paper deals with the task of practical and open source Handwritten Text Recognition (HTR) on German medieval manuscripts. We report on our efforts to construct mixed recognition models which can be applied out-of-the-box without any…

Computer Vision and Pattern Recognition · Computer Science 2022-01-20 Christian Reul , Stefan Tomasek , Florian Langhanki , Uwe Springmann

Mixed Model OCR Training on Historical Latin Script for Out-of-the-Box Recognition and Finetuning

In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Christian Reul , Christoph Wick , Maximilian Nöth , Andreas Büttner , Maximilian Wehner , Uwe Springmann

OCR4all -- An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of…

Computer Vision and Pattern Recognition · Computer Science 2021-06-01 Christian Reul , Dennis Christ , Alexander Hartelt , Nico Balbach , Maximilian Wehner , Uwe Springmann , Christoph Wick , Christine Grundig , Andreas Büttner , Frank Puppe

State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines

In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Christian Reul , Uwe Springmann , Christoph Wick , Frank Puppe

Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin

In this paper we describe a dataset of German and Latin \textit{ground truth} (GT) for historical OCR in the form of printed text line images paired with their transcription. This dataset, called \textit{GT4HistOCR}, consists of 313,173…

Computation and Language · Computer Science 2018-09-17 Uwe Springmann , Christian Reul , Stefanie Dipper , Johannes Baiter

Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting

In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based…

Computer Vision and Pattern Recognition · Computer Science 2018-07-25 Christian Reul , Uwe Springmann , Christoph Wick , Frank Puppe

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning

We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets…

Computer Vision and Pattern Recognition · Computer Science 2018-03-01 Christian Reul , Uwe Springmann , Christoph Wick , Frank Puppe

Transfer Learning for OCRopus Model Training on Early Printed Books

A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by…

Computer Vision and Pattern Recognition · Computer Science 2017-12-22 Christian Reul , Christoph Wick , Uwe Springmann , Frank Puppe

LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books

A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual…

Computer Vision and Pattern Recognition · Computer Science 2017-01-26 Christian Reul , Uwe Springmann , Frank Puppe

Profiling of OCR'ed Historical Texts Revisited

In the absence of ground truth it is not possible to automatically determine the exact spectrum and occurrences of OCR errors in an OCR'ed text. Yet, for interactive postcorrection of OCR'ed historical printings it is extremely useful to…

Computer Vision and Pattern Recognition · Computer Science 2017-01-20 Florian Fink , Klaus-U. Schulz , Uwe Springmann