چکیده:
Much attention has recently been paid to natural language processing in
information storage and retrieval. This paper describes how the
application of natural language processing (NLP) techniques can enhance
cross-language information retrieval (CLIR). Using a semi-experimental
technique, we took Farsi queries to retrieve relevant documents in
English. For translating Persian queries, we used a bilingual machinereadable
dictionary. NLP techniques such as tokenization, morphological
analysis and part of speech tagging were used in pre-and- post translation
phases. Results showed that applying NLP techniques yields more
effective CLIR performance.
خلاصه ماشینی:
Applying Natural Language Processing Techniques for Effective Persian- English Cross-Language Information Retrieval H.
This paper describes how the application of natural language processing (NLP) techniques can enhance cross-language information retrieval (CLIR).
Research in the area of cross-language information retrieval (CLIR) has focused mainly on methods for translating queries (Ba11esteros& Croft, 1998).
With the increasing availability of machine-readable bilingual dictionaries, dictionary-based query translation has become a viable approach to Cross-Language Information Retrieval (Adriani, 2000).
It is not clear that the use of natural language processing techniques, such as tokenization, morphological analysis and part-of-speech tagging would yield results that would help clarify the unknowns that exist in the relationship between NLP techniques and the Persian language.
NLP and CLIR Although natural language processing and information retrieval are two separate fields, effectiveness of using NLP techniques in IR has already been investigated.
Since all the inflected forms of a word are not included in a dictionary, query translation process faces some problems in CLIR.
A dictionary based experiment in French-English CLIR showed that word-by-word translation can decrease CLIR effectiveness by %40 to 9r60 compared with monolingual retrieval (Hu11& Grefenstette, 1996).
His findings showed that normalizing inflected words results in their translation which can improve effectiveness of CLIR processes.
Methodology To investigate the effectiveness of applying NLP techniques to Persian-English CLIR, we examined different retrieval approaches.
NLP techniques such as tokenization, part-of-speech tagging and the use of morphological analysis can address the CLIR translation problem.
Dictionary — based cross- language information retrieval: Problems, methods and research findings.