Monday, November 9, 2009

Communicative fragments by example

Communicative fragments (CF) are presented as overlapping template sequences that cover the text. They consist of fixed parts (e.g. words) and placeholders which may be filled by certain expressions. A limerick example:

There was an Old Man in a pew,
Whose waistcoat was spotted with blue;
But he tore it in pieces,
To give to his nieces,--
That cheerful Old Man in a pew.

The resulting fragments will be:

there was X[NP] in Y[NP] -> Clause
//variables are uppercase
//sample placeholder constraints are in brackets
//NP=Noun Phrase
//substituted template forms a clause

an X[NP, Starts with a vowel] -> NP
old X[NP] -> NP
man -> NP
//one-word fragment
X[NP] in Y[NP] -> NP
//no one has promised that CF will form a tree
//cf. the first fragment

a X[NP, Starts with a consonant] -> NP
pew -> NP
X[NP] whose Y[NP] Z[Clause, finite] -> NP
//note a non-projective construct
was X[Participle] -> Clause
//a rule for passive
spotted with X[Adj] -> Participle
but X[Clause] -> Clause
X[NP] tore Y[NP] -> Clause
X[=tear] in pieces -> Clause
//any form of 'tear'
X[Clause] to Y[Clause, infinitive] -> Clause
X[=give] to X[NP] -> Clause, finite
his X[NP] -> NP
that X[NP] -> NP
cheerful X[NP] -> NP

That's a way of parsing: considering syntax as a set of CF patterns where every pattern contains at least one lexical entry that triggers the rule. It should be also much easier to extract such a set from a corpus than to induce a typical generative grammar with movements.

It isn't specified if anything can occur between template components assuming that everything can. Hence free word order languages are supported, but English syntax rules are weakened very much, allowing ungrammatical sentences. So there needs to be a tight parser integration constraining the fragments' freedom.

No comments: