LX - Parser

NLX - Parser

Developed at the University of Lisbon, Dept. of Informatics, by the NLX-Natural Language and Speech Group.


features    |    versão portuguesa

 

 

Features


Table of contents

LX - Parser

LX-Parser (beta version) is a freely available on-line service for constituency parsing of Portuguese sentences. This service was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

LX-Parser performs a syntactic analysis of Portuguese sentences in terms of their constituency structure.

Parser

LX-Parser is supported by the Stanford Parser. The parser developed by the Stanford University is a statistical parser that is trained over a previously annotated corpus.

A total of 22118 sentences from CINTIL Treebank were used for training. This treebank is being developed and maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

The parser uses probabilistic grammars. Under the Parseval metric it achieves an f-score of 89% (value obtained through 10-fold cross-evaluation).

Tagset

Tag
Meaning
A
Adjective
AP
Adjective Phrase
ADV
Adverb
ADVP
Adverb Phrase
C
Complementizer
CL
Clitics
CP
Complementizer Phrase
CARD
Cardinal
CONJ
Conjuction
CONJP
Conjuction Phrase
D
Determiner
DEM
Demonstrative
N
Noun
NP
Noun Phrase
O
Ordinals
P
Preposition
PP
Preposition Phrase
PPA
Past Participles/Adjectives
POSS
Possessive
PRS
Personals
QNT
Predeterminer
REL
Relatives
S
Sentence
SNS
Sentence with null subject
V
Verb
VP
Verb Phrase

Annotation guidelines

The syntactic analyses produced by LX-Parser are similar to the analyses found in the treebank on which LX-Parser was trained. This treebank was designed along the principles described in the following handbook:

Branco António, João Silva, Francisco Costa, Sérgio Castro, 2011, CINTIL TreeBank Handbook: Design options for the representation of syntactic constituency. Department of Informatics, University of Lisbon, Technical Reports series, nb. di-fcul-tp-11-02.

Authorship

Lx-Parser is being developed by Patrícia Gonçalves and João Silva, managed by António Branco, by the NLX-Natural Language and Speech Group, partly in the scope of the SemanticShare Project, funded by FCT-Fundação para a Ciência e Tecnologia.

Contact us

Contact us using the following email address: 'nlx' concatenated with 'at' concatenated with 'di.fc.ul.pt'.

Acknowledgments

This work was partly supported by FCT-Fundation of Science and Technology under the grant FCT/PTDC/PLP/81157/2006 for project SemanticShare.
The system uses the PHPSyntaxTree Visualizer and the Stanford Parser.

Why LX-Parser?

LX because LX is the "code" name Lisboners like to use to refer to their hometown.