Anaphora Resolution in Punjabi Language

Contenu principal de l'article

Kawaljit Kaur, Vishal Goyal, Kamlesh Dutta


In this paper, we present an effort to implement an AR (Anaphoric Resolution) system in Punjabi, a language with limited resources. For this task, we developed our own corpus and used the Punjabi Shallow Parser to provide the required information like part of speech (POS) tags, chunking information, gender, person and number. The data is manually annotated with attributes - animacy, pronoun type, NER and links of anaphors to corresponding entities. This paper introduces the first attempt to perform anaphora resolution in Punjabi using Machine Learning approach. It has been represented as a two-class classification problem. Features are extracted from the annotated corpus and different classifiers have been used to perform the task. Precision, Recall and F-Score are used as evaluation metrics. Analyzing the results, it has been observed that the performance of Ensemble classifiers is better than the rest of the classifiers. To analyze the contribution of different features, we experimented using a different subset of features during classification and analyze the performance of the system. Accuracy of pre-processing tool i.e. Punajbi Shallow Parser and quality of annotation task also contributes to the overall performance of the system. As it is pioneering work, results can be enhanced by using a larger corpus and employing rule-based strategy along with machine learning approach which is the future scope for the present work.

Renseignements sur l'article