Quantitative Linguistic Study of Frequency Words in Kirill of Turov’s Words (based on the NLR manuscript F.п.I.39)
Abstract
The authors have studied quantitative and statistical qualities of the most frequent words in sermons of Kirill of Turov, contained in the Tolstoy Collection from the 13th century (NLR, F.п.I.39).
In the course of three experiments, firstly, formal distinctions were found between the list and the corresponding copies from 8 contrasting sub-corpora, them being: 11th–14th century copies of the May Menaea, other months’ Minaea, Sticheraria, Gospels, The Book of Psalms, chronicles, the Apostolos, and the Parenesis of Ephrem the Syrian; the last two appear to be the most similar to the list. Secondly, using Log-Likelihood, TF*ICTF' and Weirdness statistical tools, statistically meaningful words were found out, and a partial overlap in the forms under study appeared between the texts of Kirill and several of the sub-corpora. Thirdly, by comparing ranks of each of the forms, the closeness of the Tolstoy Collection texts and sub-corpora of different genres was estimated, and it was shown that original sermons of Kirill of Turov and translations of the teaching sermons of Ephrem the Syrian and of the Apostolos are closest to each other in terms of statistical significance of 15 most frequent forms.
For the first time, the configurations of the most significant lexemes in the sub-corpora were found out. Also for the first time, their list was found to be similar in the sub-corpora of Kirill of Turov’s sermons and of the Apostolos, as well as (partially) of the Parenesis, The Book of Psalms and the chronicles. High-rank units in the sermons of Kirill of Turov (нъ, о, бо, съ) were described in terms of linguistics, of genre and style, and of discursive pragmatics.
The work was made using the transcriptions from the historical corpus “Manuscript” (manuscripts.ru).
DOI: 10.31168/2305-6754.2020.9.1.2
Keywords
Full Text:
PDF (Русский)References
Ahmad K., Gillam L., Tostevin L., University of Surrey participation in TREC8: weirdness indexing for logical document extrapolation and retrieval (WILDER), Proceedings of the 8th Text Retrieval Conference TREC. Gaithersburg, USA, 1999, 717‒724.
Baranov V. A., Statistical Analysis of the Slavonic Paraenesis by Ephrem the Syrian (on Three Electronic Copies of the 13–14th Centuries from the Manuscript Corpus), Journal of Siberian Federal University. Humanities & Social Sciences. 8 (11), 2018, 1211‒1228.
Baranov V. A., Creation and Using of Historical Corpora of Slavonic Manuscripts, Scripta & e-Scripta, 19, 2019, 33‒57.
Baranov V. A., Opyt primeneniia kolichestvennykh i statisticheskikh metodov dlia poiska znachimykh slov v istoricheskom korpuse (na materiale srednevekovykh slavianskikh gimnograficheskikh i evangel′skikh kodeksov), H. Rothe, C. Schnell, eds., Studia Hymnographica, 2 (= Abhandlungen der Nordrhein-Westfälischen Akademie der Wissenschaften und der Künste, 131; Patristica Slavica, 24), Paderborn, München, Wien, Zürich, 2019, 149–201, DOI: https://doi.org/10.30965/9783657702824_007.
Bernstein S. B., ed., Tvoritel′nyi padezh v slavianskikh iazykakh, Moscow, 1958.
Grebennikov A. O., Writer Lexicon Frequency Dictionaries and Style Distinguishing, Russkii iazyk i literatura v prostranstve mirovoi kul'tury: materialy XIII Kongressa MAPRIaL, 7: Sovremennaia russkaia leksikografiia: teoriia i praktika, Granada, Ispaniia, 13‒20 sentiabria 2015, S.-Petersburg, 2015, 93‒96.
Grigorieva V. S., Cognitive aspect of argumentative communication, Issues of Cognitive Linguistics, 1 (14), 2008, 24–31.
Haibing W., Xiaodong G., Yiwei G., Balancing Between Over-Weighting and Under-Weighting in Supervised Term Weighting, Information Processing and Management, An International Journal, 2 (53), 2017, 547‒557 (https://arxiv.org/ftp/arxiv/papers/1604/1604.04007.pdf; last access on: 02.02.2019).
Klyshinsky E. S., Kochetkova N. A., Metod izvlecheniia tekhnicheskikh terminov s ispol′zovaniem mery strannosti, Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh, 17, 2014, 365‒370.
Kopotev M. V., Vveděnije v korpusnuju lingvistiku, Praha, 2014. (online: epub)
Kwok K. L., A network approach to probabilistic information retrieval, ACM Transactions on Information Systems, 3 (13), 1995, 324–353.
Litvinova T. A., Forensic Authorship Examination towards Gender Attribution: Challenges and Prospects, Sovremennoe pravo, 7, 2016, 111‒115.
Lyashevskaya O. N., Sharoff S. A., Novyi chastotnyi slovar′ russkoi leksiki, Moscow, 2008–2011 (http://dict.ruslang.ru/freq.php; last access on: 08.02.2019).
Martynenko G. Ya., Stylometry: Emergence and Evolution in Context of Interdisciplinary Interaction, 1, Strukturnaia i prikladnaia lingvistika, 10, 2014, 3‒23.
Martynenko G. Ya., Stylometry: Emergence and Evolution in Context of Interdisciplinary Interaction, 2. The First Half of the 20th Century: The Expansion of Interdisciplinary Contacts, Strukturnaia i prikladnaia lingvistika, 11, 2015, 9‒28.
Marusenko M. A., Atributsiia anonimnykh i psevdonimnykh literaturnykh proizvedenii metodami raspoznavaniia obrazov, Leningrad, 1990.
Nikolaev I. S., Mitrenina O. V., Lando T. M., eds., Prikladnaia i komp′iuternaia lingvistika, Moscow, 2016.
Novak M. O., On the Phenomenology and Typology of Errors in Old Russian Apostolos Manuscripts from the 12th-14th Centuries, Slověne, 1 (6), 2017, 291‒306.
Novak M. O., Origins and Language of Commentaries on the Acts in the 14th Century Tolstovskii Apostolus, Science Journal of Volgograd State University. Linguistics, 4 (16), 2017, 58‒65.
Novak M. O., The Old Slavonic Euthaliana: Structure and Language of the Chapter-List to the First Epistle to Corinthians, Science Journal of Volgograd State University. Linguistics, 4 (17), 2018, 6–15.
Rayson P., Garside R., Comparing corpora using frequency profiling, Proceedings of the Comparing Corpora Workshop at ACL 2000, Hong Kong, 2000, 1–6 (http://ucrel.lancs.ac.uk/people/paul/publications/rg_acl2000.pdf; last access on: 08.02.2019), DOI: https://doi.org/10.3115/1117729.1117730.
Robertson S., Understanding inverse document frequency: on theoretical arguments for IDF, Journal of Documentation, 5 (60), 2004, 503–520, DOI: https://doi.org/10.1108/00220410410560582.
Roelleke T., Information Retrieval Models: Foundations and Relationships, Synthesis Lectures on Information Concepts, Retrieval, and Services, 3 (5), 2013, 1–163 (https://www.morganclaypool.com/doi/abs/10.2200/S00494ED1V01Y201304ICR027; last access on: 08.02.2019).
Roelleke T., Wang J., A parallel derivation of probabilistic information retrieval models, S. Dumais et al., eds., Proceedings of the 29th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, New York, 2006, 107–114.
Salton G., Buckley C., Term-weighting approaches in automatic text retrieval, Information Processing and Management, 1988, 5 (24), 513–523 (cited according to the online edition: http://w2.ict.nsc.ru/jspui/bitstream/ICT/1231/1/solton-1-29-03.pdf; last access on: 02.02.2019).
Salton G., Yang C. S., On the specification of term values in automatic indexing, Journal of Documentation, 29, 1973, 351–372.
Sparck J. K., A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28, 1972, 28, 11–21.
Wu H. C., Luk R. W. P., Wong K. F., Kwok K. L. Interpreting TF-IDF Term Weights as Making Relevance Decisions, ACM Transactions on Information Systems, 3 (26), 2008, 13, 1-37 (https://www.scss.tcd.ie/khurshid.ahmad/Research/Sentiments/tfidf_relevance.pdf; last access on: 08.02.2019).
Zakharov V. P., Khokhlova M. V., Extracting Terminological Phrases By Different Association Measures, Tekhnologii informatsionnogo obshchestva v nauke, obrazovanii i kul'ture: sbornik nauchnykh statei. Trudy XVII Vserossiiskoi ob′′edinennoi konferentsii «Internet i sovremennoe obshchestvo» (IMS-2014), S.-Petersburg, 2014, 290–293 (http://ojs.ifmo.ru/index.php/IMS/article/view/268/264; last access on: 02.02.2019).
Zaliznyak A. A., Drevnerusskie enklitiki, Moscow, 2008.
Zaliznyak A. A., «Slovo o polku Igoreve». Vzgliad lingvista, 3-e izd., dop., Moscow, 2008.
Zaliznyak A. A., Drevnenovgorodskii dialekt, 2-e izd., pererab. s uchetom materiala nakhodok 1995–2003 gg., Moscow, 2004.
Zholobov O. F., The Corpus of Old Russian Copies of the Parenesis of Efrem Sirin. I: RGADA, Sin. 38, Russian Linguistics, Russian Linguistics, 1 (31), 2007, 31–59.
Zholobov O. F., The Corpus of the Old Russian Copies Of the Paraenesis of Ephraem Syrus. II: RNB, Pogod. 71a, Russian Linguistics, 1 (33), 2009, 37–64.
Zholobov O. F., The Corpus of the Old Russian Copies Of the Paraenesis of Ephraem Syrus. III, 1: BAN 31.7.2, Russian Linguistics, 3 (35), 2011, 361–380.
Zholobov O. F., Old Slavic Sermon Language: The Extraordinary Nature of Verb Morphology in Cyril Turovskij’s Homilies, Slověne, 2 (6), 2017, 137‒162.
Zholobov O. F., Poucheniia Efrema Sirina v intertekstual′nykh i kompozitsionnykh otzvukakh original′noi drevnerusskoi pis′mennosti, St. Tikhon’s University Review. Philology, 3 (9), 2007, 7–13.
Refbacks
- There are currently no refbacks.
Copyright (c) 2020 Victor A. Baranov, Oleg F. Zholobov
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.