Monday, October 31, 2011

Great was their and my amazement!


Let's start with a simple noun phrase: my amazement. My parser analyzes it as follows. The word my creates a frame with type ME and contributes possessor attribute to possessive construction. The word amazement creates a frame with type AMAZE and contributes head attribute to the same possessive construction. Now that both head and possessor attributes are defined, the construction links them with arg1 attribute:

A.type=ME
B.type=AMAZE
B.arg1=A


It's quite similar with their amazement:



A.type=THEY
B.type=AMAZE
B.arg1=A




Now let's parse their and my amazement. The semantics I want to get is this:

A.type=THEY
B.type=ME
C.member1=A
C.conj=and
C.member2=B
D.type=AMAZE
D.arg1=C

I want their, my and amazement to preserve their functionality. This means that after their and my there should be a possessive construction whose possessor points to the conjunction frame C. So it's the word and which should do this, there are no other candidates.

Currently and puts the parser into a special state. In this state, every subsequent contribution is intercepted and analyzed whether it can be a conjunct. A contribution can be the right operand of coordination if there was a similar contribution to the left of it (similar in the same sense as with ellipsis). When a similar contribution is found, the parser takes both left and right contributions and merges them. In this particular case, the left contribution had possessive.possessor=A, the right one  possessive.possessor=B. While merging, a new frame C is introduced, a composite over A and B. Now the possessor attribute of the merged possessive construction points to C instead of A or B. The updated contribution is applied to the parsing state, so that then, when amazement comes, the result will be precisely as wanted. That's it.

This is the core idea. In my parser, conjuncts don't connect constituents or any static structure. They connect contributions, i.e. the changes made to the parsing state. Merging two changes results in one merged change. It's quite powerful and allows for some interesting things like non-constituent coordination. But that's another story.

Tuesday, October 25, 2011

NLP using graphs

Could natural language parsing be a task on graphs? Maybe, at least partly.

Now, during parsing, every word contributes to some constructions. Some of these contributions are mutually exclusive. For example, in Russian a noun typically can't be nominative and accusative at the same time. So this noun's contributions for nom and acc constructions are incompatible.

Different words also may have contradicting contributions. Two nouns in the same clause can't both be accusative even if their forms are compatible with such an alternative (i.e. they both contribute to acc construction).

So here's the idea. Consider a graph whose vertices are formed by each construction contribution for each word. And where those contributions contradict each other in any way, there's an edge. The task is then to find a maximal subset of vertices which are not connected to each other.

There are, of course, other constraints. The resulting graph should make sense from the semantic viewpoint. The subset should be constructed incrementally and conservatively: if the parser can proceed without reanalysis of the already built structures, it should do so. Finally, this graph task has a very limited scope, e.g. it doesn't apply in different clauses.

That's just an idea. Currently my parser doesn't track individual contributions per construction, it just unifies them eagerly. But the test data suggests that might change.