LXGram

LXGram is a grammar for the computational processing of Portuguese. It is being developed under the following major design features:

Authorship

LXGram is being developed at the University of Lisbon, by NLX—Natural Language and Speech Group of the Department of Informatics, Faculty of Sciences, under the coordination of António Branco. Major coding work has been performed by Francisco Costa. The development activities benefited from support or contributions from Mariana Avelãs, Filipe Gil, Marco Gonzalez, Clara Pinto and David Raposo.

LXGram should be scholarly referred to by referring to its implementation report below:

Contacts

Check information on contacts at the NLX Group site.

Consortium

The development of LXGram has been undertaken in the scope and with the support of the Delph-in international consortium.

Technical features

LXGram is developed under the grammatical framework of Head-Driven Phrase Structure Grammar (Pollard and Sag, 1987, 1994) and uses Minimal Recursion Semantics (Copestake et al., 2005) for the representation of meaning.

The implementation of this grammar is undertaken with the grammar development environment Linguitics Knowledge Builder (Copestake, 2002). Its evaluation and regression testing is done via [incr tsdb()] (Oepen, 2001). It is also intended to be compatible with the PET parser (Callmeier, 2000).

The LinGO Grammar Matrix, version 0.9 2 (Bender et al., 2002) was used as the initial code upon which to build LXGram.

Version history

Version A.4.1, March 2008:
This is the version of the first release, corresponding to a coverage that encompasses up to point A.4.1 of development agenda. This includes, among other phenomena, verbal auxiliaries, basic phrase structure of S, VPs, PPs, APs, AdvPs, the structure of NPs (without relatives), predication structure and agreement, and part of modification structure. The complete implementation agenda can be found in the implementation report.

Implementation report

The implementation report of version A.4.1 is here.

Release

LXGram development is planned in accordance to an implementation agenda encompassing several items corresponding to major grammatical constructions and phenomena (vd. implementation report above). While implementation work is progressing towards that goal, in its current version, LXGram covers part of those items.

The experience gathered in the development of this grammar is confluent with the experience reported by other teams, developing similar grammars for other languages. The implementation of a large-scale, multi-purpose grammar for deep linguistic processing is a long term endeavour. Hence, first releases of this type of grammars tend to take place many years after their implementation work has started, when grammar quality and coverage are consolidated.

In this respect we are trying a different option, seeking to get a balance between avoiding to openly releasing a sub-optimal version but not taking to long to let our colleagues experiment with first drafts of it. Accordingly, for this first release, LXGram is being released under an ELDA license for research.

Funding

The research and development activities of LXGram were partially supported by FCT-Fundação para a Ciência e Tecnologia, of the Portuguese Ministry of Science, and to Instituto Camões, of the Portuguese Ministry of Foreign Affaires, under the research grant PLUS/PLP/50301/2003 for the project GramaXing. The former institution is also acknowledged for its partial support to the development of LXGram under the research grant PTDC/PLP/81157/2006 for the project SemanticShare.

References

Bender, E. M., Flickinger, D., and Oepen, S. 2002. The Grammar Matrix: An open-source starter-kit for the development of cross-linguistically consistent broad-coverage precision grammars. In Carroll, J., Oostdijk, N., and Sutcliffe, R., editors, Procedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pages 8--14, Taipei, Taiwan.

Callmeier, U. 2000. PET --- A platform for experimentation with eficient HPSG processing techniques. Natural Language Engineering, 6(1):99--108. (Special Issue on Efficient Processing with HPSG).

Copestake, A. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford.

Copestake, A., Flickinger, D., Sag, I. A., and Pollard, C. 2005. Minimal Recursion Semantics: An introduction. Journal of Research on Language and Computation, 3(2-3):281{332.

Oepen, S. 2001. [incr tsdb()] --- competence and performance laboratory. User manual. Technical report, Computational Linguistics, Saarland University, Saarbruecken, Germany.

Pollard, C. and Sag, I. 1987. Information-Based Syntax and Semantics, Vol. 1. CSLI Publications, Stanford.

Pollard, C. and Sag, I. 1994. Head-Driven Phrase Structure Grammar. Chicago University Press and CSLI Publications, Stanford.