Digitizing Cyrillic Manuscripts for the Historical Dictionary of the Serbian Language Using Handwritten Text Recognition Technology
Dublin Core | PKP Metadata Items | Metadata for this Document | |
1. | Title | Title of document | Digitizing Cyrillic Manuscripts for the Historical Dictionary of the Serbian Language Using Handwritten Text Recognition Technology |
2. | Creator | Author's name, affiliation, country | Vladimir Polomac; University of Kragujevac Jovana Cvijića bb, 34000 Kragujevac; Serbia |
2. | Creator | Author's name, affiliation, country | Marina Kurešević; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia |
2. | Creator | Author's name, affiliation, country | Isidora Bjelaković; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia |
2. | Creator | Author's name, affiliation, country | Aleksandra Colić Jovanović; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia |
2. | Creator | Author's name, affiliation, country | Sanja Petrović; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia |
3. | Subject | Discipline(s) | linguistics; lexicography; digital humanities |
3. | Subject | Keyword(s) | Transkribus; automatic text recognition; artificial intelligence; machine learning; historical lexicography; serbian language; Gavril Stefanović Venclović |
4. | Description | Abstract | The paper explores the possibilities of using information technologies based on the principles of machine learning and artificial intelligence in the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language. Empirical research is based on the use of the Transkribus software platform in the creation of a model for automatic text recognition of the manuscripts by Gavril Stefanović Venclović, the most significant and prolific Serbian cultural enthusiast of the 18th century, whose extensive manuscript legacy in Serbian vernacular represents the most significant primary source for the historical dictionary of the Serbian language of this period. Following the results of conducted research, it can be concluded that the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language can be significantly accelerated using Transkribus by creating specific and generic models for automatic text recognition. The advantage of automatic text recognition compared to the traditional methods is particularly reflected in the possibility of continuous improvement of the performance of specific and generic models in accordance with the progress of the transcription process and the increase in the amount of digitized text that can be used to train a new version of the model.
DOI: 10.31168/2305-6754.2023.1.08 |
5. | Publisher | Organizing agency, location | |
6. | Contributor | Sponsor(s) | The paper was financed by the Ministry of Education, Science and Technological Development of the Republic of Serbia and German Academic Exchange Service (DAAD) |
7. | Date | (YYYY-MM-DD) | 2023-10-19 |
8. | Type | Status & genre | Peer-reviewed Article |
8. | Type | Type | |
9. | Format | File format | |
10. | Identifier | Uniform Resource Identifier | https://slovene.ru/ojs/index.php/slovene/article/view/607 |
11. | Source | Title; vol., no. (year) | Slověne = Словѣне. International Journal of Slavic Studies; Vol 12, No 1 (2023) |
12. | Language | English=en | en |
13. | Relation | Supp. Files | |
14. | Coverage | Geo-spatial location, chronological period, research sample (gender, age, etc.) | Serbia |
15. | Rights | Copyright and permissions |
Copyright (c) 2023 Vladimir Polomac, Marina Kurešević, Isidora Bjelaković, Aleksandra Colić Jovanović, Sanja Petrović![]() This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License. |