Saturday, February 13, 2010

Charts and clouds

As Doug Arnold pointed out, attacking trees and looking for a replacement isn't new. It has already been done, so I didn't actually invent anything revolutionary. The construction cloud is actually a well known structure called chart, as in chart parsing. The chart is designed for keeping all possible parse variants in a compact way to ease backtracking and reparse. And my cloud does exactly this and in a very similar fashion.

One difference is what kind of information is stored in the chart. In the classical approach, it's a bunch of records like "there may be an NP constituent from word 3 to word 7". No internal structure of these constituents is stored. If someone needs the complete tree, it can be computed on demand, since all of the daughter phrases are also present in the chart. And you're lucky if there's only one variant of such subtree.

Cloud is a collection of records like "constructions X, Y and Z may form a construction T". Since constructions don't tend to have a deep hierarchy, each of them stores its children. We don't have to compute the subtree on demand. But the subtree rarely covers the whole sentence. Therefore the task is to find a subset of all constructions such that it covers the input and its constructions don't contradict each other.

Another difference is that usually chart is tightly connected with the parser. A typical chart parser computes all the results in parallel, storing every possible intermediate constituent in the chart. On the other hand, I'm into single-variant serial parsing with error recovery. Of course, nothing prevents from implementing that kind of parser with phrase-structure-based chart: just drive one variant instead of all. One of the reasons I prefer dependency-like constructions over constituents is that free word order languages appear to be easier to describe in terms of constructions.

2 comments:

Muhammad Alzaidi said...

Hi Peter,
I am really happy to share my idea with you here.. what you said is really interesting. I think you are talking from a purely computational perspective. let me talk from less computation perspective and more theoretical perspective.. within minimalism for instance "sets" are used as a formal method of describing structure if this implies something,, it implies that tree seems to be observed that it is not that satisfactory although sometimes it is working perfectly.. however, people nowaday it seems to me that they use trees as a way of representing structures and to make a well-presented argument....(;)

Peter Gromov said...

Actually, charts are most often used for tree representation. Though those trees are not very much like the ones depicted in Minimalism. In my opinion, trees are overused in linguistics. Too many of the examples seem to be analyzable using the trees, and this is misleading. But if you consider a real text, for example, your or my message here, you'll find it quite hard to fit it in a tree structure (especially counting all the punctuation and parentheticals).