LXGram
LXGram is a grammar for the computational processing of Portuguese. It is being developed under the following major design features:
- precision:
it is a precision grammar delivering accurate, linguistically grounded information of natural language sentences - deep processing:
it is a grammar for deep linguistic processing in as much as besides information on the major syntactic dimensions of grammatical constituency and dependency, it delivers (and generates from) fully-fledged logical representation of the meaning of natural language sentences - large-scale:
it is planned not to leave out any sort of regular grammatical construction or phenomena. - multi-purpose:
it is intended to make available as much linguistic information as it can possible be made explicit by automatic means, given the current state of the art in language technology, with the goal of offering itself to support the largest possible range of language technology applications.
Authorship
LXGram is being developed at the University of Lisbon, by NLX—Natural Language and Speech Group of the Department of Informatics, Faculty of Sciences, under the coordination of António Branco. Major coding work has been performed by Francisco Costa. The development activities benefited from support or contributions from Mariana Avelãs, Filipe Gil, Marco Gonzalez, Clara Pinto and David Raposo. LXGram should be scholarly referred to by referring to its implementation report below:- Branco, António and Francisco Costa, 2008, A Computational Grammar for Deep Linguistic Processing of Portuguese: LXGram, version A.4.1, Technical Report, University of Lisbon, Department of Informatics.
Contacts
Check information on contacts at the NLX Group site.Consortium
The development of LXGram has been undertaken in the scope and with the support of the Delph-in international consortium.
Technical features
LXGram is developed under the grammatical framework of Head-Driven Phrase Structure Grammar (Pollard and Sag, 1987, 1994) and uses Minimal Recursion Semantics (Copestake et al., 2005) for the representation of meaning.
The implementation of this grammar is undertaken with the grammar development environment Linguitics Knowledge Builder (Copestake, 2002). Its evaluation and regression testing is done via [incr tsdb()] (Oepen, 2001). It is also intended to be compatible with the PET parser (Callmeier, 2000).
The LinGO Grammar Matrix, version 0.9 2 (Bender et al., 2002) was used as the initial code upon which to build LXGram.
Version history
Version A.4.1, March 2008:
This is the version of the first release, corresponding to a coverage
that encompasses up to point A.4.1 of development agenda. This includes,
among other phenomena, verbal auxiliaries, basic phrase structure of S,
VPs, PPs, APs, AdvPs, the structure of NPs (without relatives),
predication structure and agreement, and part of modification structure.
The complete implementation agenda can be found in the implementation report.
Implementation report
The implementation report of version A.4.1 is here.
Release
LXGram development is planned in accordance to an implementation agenda encompassing several items corresponding to major grammatical constructions and phenomena (vd. implementation report above). While implementation work is progressing towards that goal, in its current version, LXGram covers part of those items.
The experience gathered in the development of this grammar is confluent with the experience reported by other teams, developing similar grammars for other languages. The implementation of a large-scale, multi-purpose grammar for deep linguistic processing is a long term endeavour. Hence, first releases of this type of grammars tend to take place many years after their implementation work has started, when grammar quality and coverage are consolidated.
In this respect we are trying a different option, seeking to get a balance between avoiding to openly releasing a sub-optimal version but not taking to long to let our colleagues experiment with first drafts of it. Accordingly, for this first release, LXGram is being released under an ELDA license for research.
Funding
The research and development activities of LXGram were partially supported by FCT-Fundação para a Ciência e Tecnologia, of the Portuguese Ministry of Science, and to Instituto Camões, of the Portuguese Ministry of Foreign Affaires, under the research grant PLUS/PLP/50301/2003 for the project GramaXing. The former institution is also acknowledged for its partial support to the development of LXGram under the research grant PTDC/PLP/81157/2006 for the project SemanticShare.
References
Bender, E. M., Flickinger, D., and Oepen, S. 2002. The Grammar Matrix: An open-source starter-kit for the development of cross-linguistically consistent broad-coverage precision grammars. In Carroll, J., Oostdijk, N., and Sutcliffe, R., editors, Procedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pages 8--14, Taipei, Taiwan.
Callmeier, U. 2000. PET --- A platform for experimentation with eficient HPSG processing techniques. Natural Language Engineering, 6(1):99--108. (Special Issue on Efficient Processing with HPSG).
Copestake, A. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford.
Copestake, A., Flickinger, D., Sag, I. A., and Pollard, C. 2005. Minimal Recursion Semantics: An introduction. Journal of Research on Language and Computation, 3(2-3):281{332.
Oepen, S. 2001. [incr tsdb()] --- competence and performance laboratory. User manual. Technical report, Computational Linguistics, Saarland University, Saarbruecken, Germany.
Pollard, C. and Sag, I. 1987. Information-Based Syntax and Semantics, Vol. 1. CSLI Publications, Stanford.
Pollard, C. and Sag, I. 1994. Head-Driven Phrase Structure Grammar. Chicago University Press and CSLI Publications, Stanford.