An algorithm for noun and verb ranking in linguistic data (ALNOVE)

Ioannis Phinikettos; Maria Kambanaros

Archives of Applied Science Research

Abstract

An algorithm for noun and verb ranking in linguistic data (ALNOVE)

Author(s): Ioannis Phinikettos, Maria Kambanaros

The literature of the two similar methods, statistical classification and clustering analysis, is very broad. The classification procedure builds a model to separate and classify new data points. On the other hand, using clustering analysis, subgroups are created from a set of objects. In this research, we propose a new clustering method for classifying verbs and nouns as Easy, Medium or Hard for linguistic data using normal and language-impaired (LI) population responses on a verb/noun picture-naming test. One scope of the classification is to exclude the ‘easy’ and ‘hard’ items from the analysis as ‘easy’ items are just easy to answer and ‘hard’ items may be affected by other exogenous variables. The proposed algorithm first classifies the items for each LI group by applying the McNemar test using as reference the easiest and hardest items and then performs an overall ranking using the binomial test. An implementation showed that the difference in medium responses between the normal and LI populations is greater than the corresponding difference in easy responses. This implies that when trying to distinguish between normal and impaired individuals, it is more efficient to proceed with the medium responses. Finally, a classification model is built by creating cut-off points for the two word classes (verbs and nouns) to distinguish between the typical and atypical populations using the medium responses. The clinical outcome is a shortened version of the verb and noun picture-naming test for assessment and research purposes that is valid and reliable.