LX-Parser

LX-Parser
Online demo
Authorship
Acknowledgements
Citation
License
Release
Web service
Contact us
Why LX-Parser?

LX-Parser

LX-Parser is a statistical constituency parser for Portuguese. It performs a syntactic analysis of Portuguese sentences in terms of their constituency structure.

Online Demo

For an online demo of this tool, check here.

Authorship

LX-Parser is being developed by Patricia Gonçalves and João Silva, managed by António Branco, by the NLX-Natural Language and Speech Group, partly in the scope of the SemanticShare Project, funded by FCT-Fundação para a Ciência e Tecnologia.

Acknowledgments

This work was partly supported by FCT-Fundation of Science and Technology under the grant FCT/PTDC/PLP/81157/2006 for the project SemanticShare.

Citation

When mentioning this parser, this is the reference to be used:

Silva, João and António Branco and Sérgio Castro and Ruben Reis. Out-of-the-Box Robust Parsing of Portuguese. In Proceedings of the 9th International Conference on the Computational Processing of Portuguese (PROPOR'10), pp. 75–85.

License

To use LX-Parser you must agree with its license.

Release

LX-Parser is made available as a standalone parser that you can download and run locally in your computer.

Required downloads

The parser model file, cintil.ser.gz
Stanford Parser (requires Java 5 or later). Note that the model was created with version 1.6.5 of the parser. More recent versions of the software seem to be unable to load the model.
LX-Tokenizer to tokenize input prior to parsing.

Instructions

Example command line:

java -Xmx500m -cp /path/to/stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser -tokenized -sentences newline -outputFormat oneline -uwModel edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel cintil.ser.gz input.txt

A quick explanation of the options:

For some more complex sentences, the default heap size used by Java might not be enough. We increase the maximum heap size to 500 megabytes with the -Xmx500m option.
The path to the Stanford Parser JAR file is provided with the -cp option.
The name of the Java class we wish to run (LexicalizedParser).
The input to the parser must already be tokenized (see LX-Tokenizer for details on tokenization decisions). We indicate this through the -tokenized option.
Each sentence in the input is separated by newline. We indicate this through the -sentences newline option.
The output format is one parse per line. NB: The parser always adds a ROOT node. You can remove it in a post-processing step.
A class (BaseUnknownWordModel, part of the Stanford parser package) that implements a baseline word model is used to handle unknonwn words. It is chosen by the -uwModel option.
The final two arguments are the model file and the input file.

Web service

To be available soon

Contact us

Why LX-Parser?

LX because LX is the "code" name Lisboners like to use to refer to their hometown.

Table of contents

LX-Parser

Online Demo

Authorship

Acknowledgments

Citation

License

Release

Required downloads

Instructions

Web service

Contact us

Why LX-Parser?