| OkwuIgbo.org | |
|
Menu: Search: Powered by Website Baker |
Igbo Corpus Development The Beginnings My incursion into the Igbo Corpus development effort received a great impulse from the Laurence Urdang Award of the European Association for Lexicography for the year 2002 [http://www.euralex.org/official/LUA_Uchechukwu.pdf]. The initial enthusiasm that greeted the award was overshadowed by the disappointing fact (then) that most of the corpus development or manipulation tools could not take care of fully tone-marked Igbo scripts. Nevertheless, the effort helped me gain a lot of insight into the Corpus Linguistics needs of the Igbo language, especially the realization that the field of Corpus Linguistics (as it is presently practiced in the Europe and America) is yet to be discovered for most Nigerian languages. Things have now started changing for the better as a result of the introduction of Unicode. The Software SolutionUnicode is now an accepted standard, but many corpus manipulation tools are not 100% Unicode-based. For the Igbo language, a corpus development/manipulation tool can count as useful if it can be used to manipulate fully tone-marked Igbo texts. The best solution that also makes it very easy to manipulate and explore a fully tone-marked Igbo text is the Ellogon Text Engineering Platform, developed and maintained by George Petasis . The tool is open source and can be freely downloaded from www.ellogon.org. An Igbo System is currently being developed for the platform.
The Ellogon platform is simply a platform for ANY text engineering tools you can add in the form of modules. Modules can be written in different programming languages like Java, Python, Perl etc. Below is an example of the Igbo Tokenizer and Igbo Sentence Splitter within the Igbo System.
|