Digitizing Cyrillic Manuscripts for the Historical Dictionary of the Serbian Language Using Handwritten Text Recognition Technology
Abstract
The paper explores the possibilities of using information technologies based on the principles of machine learning and artificial intelligence in the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language. Empirical research is based on the use of the Transkribus software platform in the creation of a model for automatic text recognition of the manuscripts by Gavril Stefanović Venclović, the most significant and prolific Serbian cultural enthusiast of the 18th century, whose extensive manuscript legacy in Serbian vernacular represents the most significant primary source for the historical dictionary of the Serbian language of this period. Following the results of conducted research, it can be concluded that the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language can be significantly accelerated using Transkribus by creating specific and generic models for automatic text recognition. The advantage of automatic text recognition compared to the traditional methods is particularly reflected in the possibility of continuous improvement of the performance of specific and generic models in accordance with the progress of the transcription process and the increase in the amount of digitized text that can be used to train a new version of the model.
DOI: 10.31168/2305-6754.2023.1.08
Keywords
Full Text:
PDFReferences
Besters-Dilger J., Rabus A., Neural Morphological Tagging for Slavic: Strengths and Weaknesses, Scripta&e-Scripta, 2021, 21, 79–92.
Bjelaković I., Predlog mikrostrukture istorijskog rečnika srpskog jezika, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 387–400.
Burlacu C., Rabus A., Digitising (Romanian) Cyrillic using Transkribus: new perspectives, Diacronia, 2021, 14, 1–9.
Cvetković Teofilović I., Putopisi kao izvori za izradu rečnika srpskog jezika XII−XVIII veka, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 165–184.
Grković-Mejdžor J., Spisi iz istorijske lingvistike, Sremski Karlovci, Novi Sad, 2007.
Grković-Mejdžor J., Ka istorijskom rečniku srpskog jezika, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 11–24.
Grković-Mejdžor J., Bjelaković I., Definisanje leksičkog značenja u istorijskom rečniku srpskog jezika, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 367–386.
Grozdanović-Pajić M., Hartija i vodeni znaci u Venclovićevim rukopisima pisanim u Komoranu i Đuru, Sentandrejski zbornik, 1992, 2, 177‒197.
Ivić P., Pregled istorije srpskog jezika, Sremski Karlovci, Novi Sad, 2014.
Jović N., Medicinski spisi kao izvor za istorijski rečnik srpskog jezika, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 185–198.
Kiesling B., Tissot R., Stokes P., Stökl Ben Ezra D., eScriptorium: An Open Source Platform for Historical Document Analysis, 2019 International Conference on Document Analysis and Recognition Workshop (ICDARW). Sydney, 2019, 19–24.
Kurešević M., The Language of the Story of the Sage Ahiquar from Serbian Manuscript No. 53 of the National Library of Serbia, Južnoslovenski filolog, 2016, 72/1–2, 105–126.
Kurešević M., Gramatičke informacije u istorijskom rečniku srpskog jezika: polazni principi, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 319–345.
Kurešević M., Lutovac Kaznovac T., Colić Jovanović A., Bajić V., Raščitavanje i prenos u elektronsku formu ćirilske građe za istorijski rečnik srpskog jezika: nedoumice i moguća rešenja, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 81–113.
Mühlberger G., Seaward L., Terras M., Oliveira Ares S., Bosch V., Bryan M., Colluto S., Déjean H., Diem M., Fiel S., Gatos B., Greinoecker A. Grüning T., Hackl G., Haukkovaara V., Heyer G., Hirvonen L., Hodel T., Jokinen M., Kahle P., Kallio M., Kaplan F., Kleber F., Labahn R., Lang M., Laube S., Leifert G., Louloudis G., McNicholl R., Meunier J., Michael J., Mühlbauer E., Philipp N., Pratikakis J., Puigcerver Pérez J., Putz H., Retsinas G., Romero V., Sablatnig R., Sánchez J., Schofield P., Sfikas G., Sieber C., Stamatopoulos N., Strauss T., Terbul, T., Toselli A., Ulreich B., Villegas M., Vidal E., Walcher J., Wiedermann M., Wurster H., Zagoris K., Transforming scholarship in the archives through handwritten text recognition, Journal of Documentation, 2019, 5/75, 954–976.
Pavić M., Gavril Stefanović Venclović, Beograd, 1972.
Pavlović S., Leksikografska obrada gramatičkih reči u istorijskim rečnicima, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 345–366.
Polomac V., Lutovac Kaznovac T., Automatic Recognition of Serbian Medieval Manuscripts by Applying the Transkribus Software Platform: Current State and Future Perspectives, Matica Srpska Journal of Philology and Linguistics, 2021, LXIV/2, 7–26.
Polomac V., Serbian Early Printed Books from Venice. Creating Models for Automatic Text Recognition using Transkribus, Scripta&e-Scripta, 2022, 22, 11–29.
Polomac V., Serbian Early Printed Books. Towards Generic Model for Automatic Text Recognition using Transkribus, D. Fišer, T. Erjavec, eds., Proceedings of the Conference on Language Technologies and Digital Humanities, Ljubljana, 2022b, 154–161.
Rabus A., Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach using Transkribus, Scripta&e-Scripta, 2019, 19, 9–32.
Radovanović M., Fazi lingvistika, Sremski Karlovci, Novi Sad, 2015.
Savić V., Milanović A., Identifikacija i formiranje odrednica u srpskom istorijskom rečniku, Istorijska leksikografija srpskog jezika, Novi Sad, 2021, 277–318.
Sindik N., Grozdanović-Pajić M., Mano-Zisi K., Opis rukopisa i starih štampanih knjiga Biblioteke Srpske pravoslavne eparhije budimske u Sentandreji, Beograd, Novi Sad, 1991.
Stefanović D., Jovanović T., Venclovićev sentandrejski bukvar: 1717, Budimpešta, Beograd, 2013.
Subotić Lj., Iz istorije književnog jezika: pitanje jezika, Predavanja iz istorije jezika, Novi Sad, 2004, 142‒191.
Trifunović Đ., Stara srpska književnost: osnovi, Beograd, 2009.
Vasiljev Lj., Bukvar iz 1717. godine — delo Gavrila Stefanovića Venclovića, Matica Srpska Journal of Philology and Linguistics, 1996, 39/2, 169‒184.
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Vladimir Polomac, Marina Kurešević, Isidora Bjelaković, Aleksandra Colić Jovanović, Sanja Petrović
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.