Monday, July 26, 2010

Frames, constructs and the log

Some time has passed. My ideas of the internal representation of natural language have changed. Here are the new ones.

Every word in the input is taken as a whole and all the information about it is stored in a single frame. The frame may have different aspects: morphological, syntactic, semantic, discourse, etc. The aspect names are called roles, the objects corresponding to these roles are called constructs.

The lexical ambiguities are handled frame-internally. In case it's syntactically ambiguous, it has several possible syntactic constructs. A polysemous word may have different semantic constructs. The frame should eventually choose one construct for each role, and the chosen constructs have not to contradict with each other. Here's how a frame for the famously ambiguous bank could look like:
Once frames with embedded constructs appear in the model, they remain activated for some time and may perform various actions. They may add constructs to their own or another frame. They may establish links to constructs of other roles in the same frame. Or they may create connections to other frames.

Establishing a connection between frames means creating a new frame whose child frames are the ones that are connected. The new frame may also have several aspects. For example, in the usual John loves Mary two extra frames are created to link the predicate with the subject and the object respectively. These frames host two constructs both: one representing syntactic relation, another - semantic one (in experiencer, state and theme terms):
As I've already stated, not only the final structure is important, but also the sequence in which it was constructed. For this, a simple program-like log can be used:

frame: syn=noun, sem=JOHN
frame: syn=verb;transitive, sem=love
frame ^2 ^1: syn=subject+predicate, sem=experiencer+state
frame: syn=noun, sem=MARY
frame ^3 ^1: syn=predicate+object, sem=state+object

Each line here talks about some new frame. The frame's children are referred to as ^x where x is how many lines back was the one talking about the mentioned frame. ^1 means the previous line, ^2 - the one before the previous. When a frame is created, it has no aspects. Their subsequent assignments are reflected in the log.

This log reflects the dynamics of the parsing process and can thus be used in the applications where the order of operations is important, like machine translation.