Wednesday, December 30, 2009

Separate sentence meaning

Many natural language semantic formalisms are dividing the meaning into predicates and their arguments. They may be called different names, but John loves Mary's representation is anyway something like LOVES(John, Mary). Almost everyone seems to agree that this captures this sentence's meaning quite well, though details of representation may vary significantly.

What's wrong with it? Imagine this idea being uttered in different contexts (bold indicates logical stress):

(1) Who loves Mary? John loves Mary.
(2) Whom loves John? John loves Mary.
(3) Does John like Mary? John loves Mary.
(4) Alice loves Bob, John loves Mary.
(5) Why is John so happy? John loves Mary.
(6) John loves Mary. //the very beginning of a text
(7) Do you know who loves Mary? It's John!

After hearing any of these 7 examples the listener knows that LOVES(John, Mary). But does it mean that in each example there's a sentence with that meaning? Actually, only (6) has exactly that meaning, while in the other examples it's split across two clauses in various ways.

A natural definition of sentence semantics would be the difference between listener's knowledge before and after hearing the sentence. In this case, the meanings of John loves Mary are completely different, because we hear this clause with different background knowledge:

(1) We know LOVES(X, Mary). X := John
(2) We know LOVES(John, X). X := Mary
(3) We know X(John, Mary) and even wonder if X=LIKES. But X := LOVES
(4) We know LOVES(X, Y). X := John, Y := Mary.
(5) We know X(John). X := λy LOVES(y, Mary).
(6) We know nothing. LOVES(John, Mary).
(7) We know LOVES(X, Mary). X := John. //same as (1)

We now see 6 very different semantics for just one sentence, pronounced with different intonation. And only (6) is canonical, where we just have no background (although the listener probably knows John and Mary). So it appears that the traditional logical approach describes just the sentences that start a text/discourse. But there are very few of them compared to the number of all the sentences in the language! What's the point of analyzing only a small fraction of the material?

So, to describe a sentence meaning, you should always pay attention to what the reader/listener knew before conceiving it. Otherwise you just can't call it sentence meaning. Isn't that obvious? Fortunately, there are modern dynamic semantics approaches that seem to understand the problem. It's just a pity that for such a long time it wasn't widely appreciated.

Saturday, December 26, 2009

Natural language programming

Programming languages (PL) are for computers, natural languages (NL) — for humans. Computers execute programs, we execute texts. So perhaps it would be useful for NL processing to look for more inspiration in PL processing? I don't mean syntax which seems much more complicated in NL. I'm talking about semantics and pragmatics.

In NL, semantics is the literal meaning of what's written, and pragmatics is how a human will actually understand it in discourse, what the response will be, which changes will the text cause in the world. There's something similar in PL. Semantics is about how each construction (assignment, addition, loop, etc.) is executed. It's usually found in the language specification and can be formalized. Pragmatics, OTOH, is what the program does, is it quick-sort, tic-tac-toe game or Linux. PL pragmatics explores how the separate statement meanings are working together to get the job done.

PL semantics is clearly defined and expressible in terms of the same PL (assuming it's Turing-complete). There are plenty of interpreters and compilers proving this. At the same time, PL pragmatics is different, it can't be captured by means of any PL. Generally speaking, you can't tell by looking at a program, what it does. You even can't tell if it'll ever stop. Nevertheless, there are static code analysis tools that do capture some pragmatic aspects of a program and they actually help to find some bugs.

So, if we believe Church and Turing, then there are two news for NLP. The good one is that NL semantics can be fully defined in the terms of that NL, by human beings. And the bad one is that you can write tons of papers analyzing the hidden meanings and ideas in Hamlet and never come up with a complete description. It's pragmatics.

Sunday, December 6, 2009

Information structure in music

The Vocal-Master once told me that the more popular and catchier a melody is, the more repetitions one can find in it. Take Memory or My Way or Dancing Queen as an example: almost every phrase is a repetition of one of the previous phrases with some variations. Similarity helps memorizing the song, variation helps not to get bored.

In fact, all the melodies I can think of contain repetition and variation in every phrase. But what strikes me is that it resembles the natural language information structure very much. Repetition is a kind of Topic (old, given, assumed information), and variation resembles Focus (new, highlighted parts of the sentence). A difference is that the music hasn't any devices to express reference except for repetition, while sentence Topic may not be seen in the discourse before. It may be an indirect or anaphoric reference (A phone rang. A nice female voice asked who she's talking to), or a real-world entity that 'everyone knows' (The president visited England). But I still see a great deal similarity. Both music and discourse are developing in time, introducing new themes and information. What was new in one sentence, becomes given in another one; we can repeat what once was a variation.

The music theory, of course, knows a lot of various ways of developing melody. Music even has phrases and sentences, questions and exclamations. Moreover, music and language are processed by the same brain systems and there are theories of syntactic processing in music. It may even have semantics! Seems that it's only me who didn't suspect this language-music relation until recently.