An NLP method in the corpus analysis of Central Kurdish definiteness marker

Abstract

In this study, Regular Expression (Regex) is used to improve searching techniques in Natural Language Processing (NLP). Regex is a string of text in which a user is allowed to create patterns which can be useful for text matching, locating, and managing. The aim was to raise the performance of NLP for response generation, i.e., Natural Language Generation (NLG). The analysis and the performance based on Regex shows that this method is very useful especially by providing a large number of patterns for matching. Patterns which include definite markers in Kurdish have many forms and our analysis shows that Regex is very useful to identify and detect them. Our findings indicate that Regex seem to be efficient enough to extract relevant response in Kurdish free speech data. For example, using Regex was helpful by reducing stemming and match all type of searches which look like similar.

About the SADiLaR DH colloqiuims:

SADiLaR organizes a monthly (online) colloquium showcasing research related to digital humanities. Each month a speaker will present their work in the area of digital humanities.

Date
Mar 15, 2023 10:00 AM — Mar 15, 2022 11:00 AM
Anelda van der Walt
Anelda van der Walt
ESCALATOR Programme Manager