Faculty of Language, Literature and Humanities - Corpus Linguistics and Morphology

RIDGES-OCR

RIDGES-OCR contains the OCR-recognized text of complete books selected from the RIDGES corpus. The recognition was achieved with models for the OCRopus OCR engine that were specifically trained on these titles. Mean character accuracies on test pages range from 94% to over 98% for 20 titles printed in broken letters (Fraktur) between 1487 and 1870 (see Springmann & Lüdeling 2017).

If you use these texts, please cite:

Springmann, Uwe, and Anke Lüdeling. 2017. OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus. Digital Humanities Quarterly 11 (2) [ http ]

 

The following titles are available:

 

Garten der Gesunthait (Johannes von Cuba, 1487):

[Digitalisat] [view OCR][download OCR]

 

Artzney Buchlein der kreutter (Johannes Tallat, 1532):

[Digitalisat] [view OCR][download OCR]

 

Contrafayt Kreüterbůch (Otto Brunfels, 1532):

[Digitalisat] [view OCR][download OCR]

 

New Kreutterbuch (Hieronymus Bock, 1539):

[Digitalisat] [view OCR][download OCR]

 

New Kreüterbuch (Leonhart Fuchs, 1543):

[Digitalisat] [view OCR][download OCR]