چکیده:
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dictionary for English. The article describes in detail the method for obtaining data. We provide correlation coefficients calculated using different methods. We pay special attention to cases of inconsistent results obtained by different methods. The statistical model behind the experimental data is discussed. The results of experiments with the Google Books Ngram corpus on the coexistence of concrete words are given. Possible applications of the dictionary are demonstrated on the example of the frequency of using the dictionary in Russian textbooks for high schools.
خلاصه ماشینی:
215 Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application Solovyev VD.
Corpus analysis and 1 Kazan Federal University, Kazan, Russian Federation 2 Innopolis University, Innopolis, Russian Federation 3 Corresponding author 216 2161 Special Issue of Journal of Research in Applied Linguistics, 10, Spring 2019 methods of computational linguistics provide new opportunities for studying abstractness/ concreteness (Tumey, Neuman, Assaf, & Cohen, 2011; Frassinelli & Walde, 2019) .
For the English language, a dictionary (more than 4 thousand words) with an indication of the numerical measure of abstractness / concreteness of words was created as early as 1981 (Coltheart, 1981) and is still used in research.
2. Literature Review The article (Snefjella, Genereux, & Kuperman, 2018) presents the methodology for constructing a dictionary of abstract / concrete words based on a corpus of texts.
A description of the construction of a dictionary of abstract / concrete words for the Chinese language is given in Wang and Chen (2019) study.
1) Creating lists of abstract and concrete words extracted from the Russian semantic dictionary ( created by N.
By themselves, these lists are insufficient because we set the task of obtaining a vocabulary of words with an indication of the degree of abstractness / concreteness of words (a characteristic in the spirit of fuzzy logic (Sandler & Tsitolovsky, 2008)), similar to dictionaries for the English language (Coltheart, 1981; Brysbaert, Warriner, & Kuperman, 2014).
4) Extraction of abstract/ concrete words by automatic methods from the super-large corpus of the Russian language GoogleBooksNgram (https://books.