Natural Language Processing(NLP) is an active research area and the formation of international associations such as The Association for Computational Linguistics (ACL) has shown the potential of research in this field. One of the problems faced by NLP system developers is that user's language is so variable.
The type of natural language investigated in this project is in textual input rather than spoken, as spoken communication is beyond the scope of this project. This project investigates the problems relating to common human textual input errors such as spelling errors and language usages that must be addressed for a robust NLP system.
One aim of this project is to help non-computing individuals construct Perl 5 Regular Expressions (P5RE) that are used for pattern recognition in the Dialogue Management System (DM). In order to "understand" the input, the DM needs to process the string of characters. For example, a predefined question might be "I need help." If the user enters "I need helps", the DM might not understand this input because of the extra 's' after "helps". The use of regular expression in "help(s)?" can include both "help" and "helps" while the DM checks for the input.
Similarly a user with no prior knowledge of regular expression programming would simply enter a number of questions and the expression generator will generate a compact Perl 5 Regular Expression to match different types of strings with similar meanings. Using the same example of "I need help", the generator should be able to match "I need helps", "I needs help", "I need assistance", "Help me" and etc…
The research will be carried out in a number of stages:| Task | Time Usage |
| Background Reading | 3-5 weeks (Semester One) |
| Design Methodology | 2-3 weeks (Semester One) |
| Implementation and Testing | 5-6 weeks (Semester One and Semester Two)(Semester Two) |
| Evaluation | 3-4 weeks (Semester Two) |
| Write-up | Rest of semester (Semester Two) |
A DM system environment that enables a non-computing user to prepare questions and responses with no knowledge of P5RE being required. The system will optimise the user's question to transparently create accurate and compact P5RE. The research will also evaluate the effectiveness of the system.