Indexing metadata

Digitizing Cyrillic Manuscripts for the Historical Dictionary of the Serbian Language Using Handwritten Text Recognition Technology


 
Dublin Core PKP Metadata Items Metadata for this Document
 
1. Title Title of document Digitizing Cyrillic Manuscripts for the Historical Dictionary of the Serbian Language Using Handwritten Text Recognition Technology
 
2. Creator Author's name, affiliation, country Vladimir Polomac; University of Kragujevac Jovana Cvijića bb, 34000 Kragujevac; Serbia
 
2. Creator Author's name, affiliation, country Marina Kurešević; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia
 
2. Creator Author's name, affiliation, country Isidora Bjelaković; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia
 
2. Creator Author's name, affiliation, country Aleksandra Colić Jovanović; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia
 
2. Creator Author's name, affiliation, country Sanja Petrović; University of Novi Sad Zorana Đinđića 2, 21 000 Novi Sad; Serbia
 
3. Subject Discipline(s) linguistics; lexicography; digital humanities
 
3. Subject Keyword(s) Transkribus; automatic text recognition; artificial intelligence; machine learning; historical lexicography; serbian language; Gavril Stefanović Venclović
 
4. Description Abstract

The paper explores the possibilities of using information technologies based on the principles of machine learning and artificial intelligence in the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language. Empirical research is based on the use of the Transkribus software platform in the creation of a model for automatic text recognition of the manuscripts by Gavril Stefanović Venclović, the most significant and prolific Serbian cultural enthusiast of the 18th century, whose extensive manuscript legacy in Serbian vernacular represents the most significant primary source for the historical dictionary of the Serbian language of this period. Following the results of conducted research, it can be concluded that the process of digitizing Cyrillic manuscripts for the purposes of creating a historical dictionary of the Serbian language can be significantly accelerated using Transkribus by creating specific and generic models for automatic text recognition. The advantage of automatic text recognition compared to the traditional methods is particularly reflected in the possibility of continuous improvement of the performance of specific and generic models in accordance with the progress of the transcription process and the increase in the amount of digitized text that can be used to train a new version of the model.

 

DOI: 10.31168/2305-6754.2023.1.08

 
5. Publisher Organizing agency, location
 
6. Contributor Sponsor(s) The paper was financed by the Ministry of Education, Science and Technological Development of the Republic of Serbia and German Academic Exchange Service (DAAD)
 
7. Date (YYYY-MM-DD) 2023-10-19
 
8. Type Status & genre Peer-reviewed Article
 
8. Type Type
 
9. Format File format PDF
 
10. Identifier Uniform Resource Identifier https://slovene.ru/ojs/index.php/slovene/article/view/607
 
11. Source Title; vol., no. (year) Slověne = Словѣне. International Journal of Slavic Studies; Vol 12, No 1 (2023)
 
12. Language English=en en
 
13. Relation Supp. Files
 
14. Coverage Geo-spatial location, chronological period, research sample (gender, age, etc.) Serbia
 
15. Rights Copyright and permissions Copyright (c) 2023 Vladimir Polomac, Marina Kurešević, Isidora Bjelaković, Aleksandra Colić Jovanović, Sanja Petrović
Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.