№1, 2021


Afruz M. Gurbanova

This article provides an applied approach to the automatic terms’ extraction from the corpus for a particular subject area. Terms’ extraction is commonly focused on the determination of the basic vocabulary of a particular field. Unlike the traditional manual terms’ extraction, automatic extraction is a computerized tool to simplify this time-consuming task and is aimed at automating the pre-determination of term-candidates. Currently, the dynamics of the growth of the volume of information that must be processed in many areas (lexicography, terminology, information retrieval, etc.) makes the issue of automatic selection of terms and keywords especially relevant. Automatic Term Extraction is well-established discipline within Natural Language Processing and many different approaches and systems developed. Various sub-issues of automatic terms’ extraction that is corpus collection, unity, definition of terms and variants, and system evaluation are presented. Five methods for automatic terms’ extraction are studied and comparatively analyzed. An experiment is conducted on the corpus of articles included into the journals "Problems of Information Technology" and "Problems of the Information Society". An expert and formal joint assessment methodology is proposed, and the results of the comparative assessment of the automatic terms’ extraction methods are presented (pp.55-69).

Keywords: automatic term extraction, Natural Language Processing, corpus collection, linguistic approaches, statistical approaches, termhood.
DOI : 10.25045/jpit.v12.i1.05
