Wednesday, March 30, 2011

New model

What happens when you hear a sentence? Nobody knows for sure, but there are some guesses on the market. Many people think you build a logical formula, and do something with it (perhaps, store it and use for inference). However, I find huge logical expressions terribly hard to analyze, and as I surely need some analysis to translate correctly, this idea doesn't suit me.

So I prefer to think in terms of objects and relations between them. Luckily, I don't have to invent anything, since some psychologists have similar ideas. They argue that during sentence comprehension people don't operate with abstract logical symbols, but mentally simulate the input, and this is what we call meaning. You hear The eagle in the sky, you imagine an eagle in the blue sky with few clouds and airplanes, and it's natural that the eagle in this picture has wide-spread wings, and not, say, folded.

Many experiments seem to support this theory. But actually that doesn't matter: I don't care that much what really happens in the brain. What I care about are ideas that may be useful for parsing. And I find the idea of mental simulations quite useful. The words don't have to determine the meaning, they act like cues which invoke the memories of previous situations where you met them. These memories are put together and a new situation is simulated. The situation may involve objects or actions that are not mentioned at all, but they'll be highly accessible in the following discourse, without a complex logical inference. The important things that the hearer learned through simulation are stored in the memory for future retrieval, also as parts of other simulations.

This is a completely informal theory. That's great because I may treat it in any way useful to me. In particular, I assume the simulation engine to be a black box that communicates with the parser and generator via frames. Frames are just objects, with attributes and values. Values may be strings or other frames. The parser creates cues like Frame A has attribute 'actor' pointing to frame B or Frame C is described as 'man'. And the notation is:

The parser gives such cues to the simulator as soon as it encounters new words. It also receives feedback which may help choosing between several competitive parses. But it doesn't know what the simulator does internally, it only sees those frames and attributes. That allows me to mock the simulator in any way I want. In the end, my main object of interest is parsing, so I leave a well-behaving simulator to someone else.

The target language generator has access to both the sequence of the cues fed by the parser to the simulator, and to the simulator itself. The cue sequence is needed to provide as close translation as possible, so that what was said in the source will be more or less the same as what was generated. The simulator is needed to ensure that the information inferred from the cues in both languages is similar as well. The simulation results may also be used in cases when the generator needs something not explicitly specified in the source but obligatory in the result, for example, pronoun gender, when translating from Finnish to Russian.

So far, the model is quite simple. A discourse is a sequence of (numbered, #1,#2,...) situations. Each situation has a sequence of constraints on frames (referred to by variables: A,B,C,...) and their attribute values. Situations are also variables, and also have properties that may be assigned.

For example, let's consider the first part of Kharms' Sonnet:

An amazing thing happened to me today, I suddenly forgot....

It's currently analyzed as:

----- Situation #1 -----
----- Situation #2 -----
----- Situation #3 -----