Wednesday, December 10, 2008

Time flies like an arrow: an ambiguity resolution example

Let's suppose we have to analyze a well-known ambiguous phrase 'Time flies like an arrow'. Here all three first words can be sentence predicates. Presented as a function composition, these alternatives will look as follows (in LISP notation, with function separated from its argument by space, both inside parentheses):

(1) 'Time' is a verb: ((like (an arrow)) (time flies))
(2) 'Flies' is a verb: ((like (an arrow)) (flies time))
(3) 'Like' is a verb: ((like (time flies)) (an arrow))

All of these applicative representations are self-disambiguating. Let's list the lexical word ambiguities. 'Time' can be either singular noun or a verb, 'flies' - plural noun or a singular third-person finite verb, 'like' - a verb or an adverb.

The result of '(an arrow)' occurring in all three alternatives is always the same, it means some object (it may be ambiguous, but the context doesn't provide more information, and the fact it's inanimate noun should satisfy us now).

We see 'time' being applied to 'flies' in the first and third variants. The result is ambiguous, it's either an order to time some flies (e.g. measure the time of their flight) or a description of some flies (maybe they fly through time from future to past).

If we exchange the function-argument roles, we get '(flies time)' from the second variant. Here the situation is better: 'flies' as noun doesn't have actants, therefore can't be a function, so only its verb-alternative remains. The latter also can't be applied to verbs in any way, so 'time' is noun there now, and '(flies time)' has clause type.

As for the '(like (an arrow))' construction, of course 'like' here can be an adverb describing the similarity of something to an arrow, but it can also be a verb. It can't be a usual narrative sentence predicate, since then it would require a subject, which should be passed to it as first argument by parser. And '(an arrow)' can't be this subject since it's 3d person singular requiring the verb to end with 's'. But 'like' here can still be imperative or infinitive verb. Both can't be applied then to clause '(flies time)', which disambiguates the second variant completely, giving the usual proverb about time which is flying too fast.

The verb 'like' also has only one direct object, which will be filled by '(an arrow)', so the resulting construction can't be applied to '(time flies)': we remember, it's either an imperative clause, or a noun phrase. This leaves only adverbial alternative for 'like'. It can modify only clause, so '(time flies)' must be really an imperative clause. Congratulations, we've disambiguated (1). We have to measure the flight time of some flies, and do this in a precise way, like an arrow, whatever this could mean.

Only the third variant is left. Here 'like' is applied to '(time flies)' and the result is applied to '(an arrow)' then. Imperative variant of '(time files)' can't be an argument of both verb and noun variants of 'like', and the noun 'like' doesn't have actants. This leaves us with verb 'like' eating noun phrase '(time flies)', which can be its subject when the verb is finite or its object when the verb is imperative or infinitive. An attempt to eat also '(an arrow)' then removes the non-evident imperative and infinitive alternatives, making this interpretation also unambiguous. We will know that all the future-to past flies are fond of some arrow.


Ilya said...

Hello All!

I can't actually understand the need of disambiguation. As far as I understand it was already done in the process of generating the three function compositions. Every word-aka-function in a composition is connected with some disambiguating dictionary entry, otherwise how could we manage to build this three compositions and don't even consider other compositions such as ((arrow (an time)) (flies like)).

Peter said...

The idea is that the described structure is precisely the result of parser. When I write "arrow" it's precisely a string or a LISP atom, nothing more. The evaluator will then interpret these strings/atoms as it likes; different evaluators may assign different meanings to these values.
When a new composition is added to the existing tree, all possible evaluators (at least syntax-head and semantic ones) are run and help in selecting only those compositions that produce the right result. But then their results are forgotten (or rather cached) which leaves us with a very simple tree.

Ilya said...

Is it a good idea to forget all information except a simple tree? And then recover it again in the process of "ambuguity resolution".
The only usage I can imagine is to let user input sentences directly in the LISP form. Then we do need to disambiguate the input.

You even don't suggest to write a standard form of words in the composition. If I process the forld "flies" I should repeat morpholical analysis again if I want to find out that this is a verb "fly"

Peter said...

To me it seems really a good idea. It simplifies output a lot. Keeping more info considered premature optimization ;).

I don't want to write the standard form because there can be several of them, cluttering the output. Recalculation can be easily replaced by caching, producing all the things you want to remember without changing the tree structure.

Morphological analysis can be done by parser producing something like '(3dperson fly)', though this will greatly increase the number of concurring compositions. I need to think more about that.