خلاصة:
Dialectology studies a dialect scientifically along with its geographical distribution.Each dialect is a language; and to study a dialect various linguistic analyses are required.This property makes the study of a language a little long in terms of time.Collecting dialectical data is very time consuming and required a lot of effort.Raw data is not much usable in dialectology and it is required to add linguistic analyses to the data in the framework of structural linguistic analysis.Using a computer as a research tool causes to prepare the data in a specific structure.The main contribution of the current paper is proposing a standard to organize dialectic data and information.This standard contains the dialectic data, its relevant meta-data, and the linguistic information related to the analysis of this data.The meta-data and linguistic information are organized in the XML tree structure.This data structure is highly portable and it can be easily read into a database.
ملخص الجهاز:
Thirteenth Year - Number 26 - Autumn and Winter 2023-2024 Standardization of Dialectal Data and Information: Necessity and Solution Masoud Ghayoomi 1 Research Article Abstract Dialectology deals with the scientific study of a dialect and its geographical distribution.
This information is organized based on a tree data structure and an extensible markup language.
Dalkir Journal of Comparative Linguistic Research Thirteenth Year - Number 26 - Autumn and Winter 2023 5 The invention of the computer caused a distinction to be created between the concepts of data, information, and knowledge.
In the field of Natural Language Processing, input data must be structured based on a specific framework; in this regard, the 'Lexicon Markup Framework,' which has been introduced by the International Organization for Standardization and deals with the management of linguistic resources, has been accepted.
He has introduced the method of organizing data by utilizing eXtensible Markup Language and the 'Conference on Computational Natural Language Learning' (CoNLL) structure (Buchholz and Marsi, 2006).
The data structure used in this series of workshops is based on eXtensible Markup Language, such that the meaning of the target word is defined as an attribute and a value in each tree node based on eXtensible Markup Language.
In this article, a specific structure in the form of a standard is presented for organizing metadata and cognitive linguistic analyses of speech data at various levels, including phonology, morphology, syntax, semantics, and discourse analysis.
Various types of cognitive linguistic information are defined with an attribute and value structure in each node, which will be explained below.