Sepulkarium: 2009

Wednesday, December 30, 2009

Separate sentence meaning

Many natural language semantic formalisms are dividing the meaning into predicates and their arguments. They may be called different names, but John loves Mary's representation is anyway something like LOVES(John, Mary). Almost everyone seems to agree that this captures this sentence's meaning quite well, though details of representation may vary significantly.

What's wrong with it? Imagine this idea being uttered in different contexts (bold indicates logical stress):

(1) Who loves Mary? John loves Mary.

(2) Whom loves John? John loves Mary.

(3) Does John like Mary? John loves Mary.

(4) Alice loves Bob, John loves Mary.

(5) Why is John so happy? John loves Mary.

(6) John loves Mary. //the very beginning of a text

(7) Do you know who loves Mary? It's John!

After hearing any of these 7 examples the listener knows that LOVES(John, Mary). But does it mean that in each example there's a sentence with that meaning? Actually, only (6) has exactly that meaning, while in the other examples it's split across two clauses in various ways.

A natural definition of sentence semantics would be the difference between listener's knowledge before and after hearing the sentence. In this case, the meanings of John loves Mary are completely different, because we hear this clause with different background knowledge:

(1) We know LOVES(X, Mary). X := John

(2) We know LOVES(John, X). X := Mary

(3) We know X(John, Mary) and even wonder if X=LIKES. But X := LOVES

(4) We know LOVES(X, Y). X := John, Y := Mary.

(5) We know X(John). X := λy LOVES(y, Mary).

(6) We know nothing. LOVES(John, Mary).

(7) We know LOVES(X, Mary). X := John. //same as (1)

We now see 6 very different semantics for just one sentence, pronounced with different intonation. And only (6) is canonical, where we just have no background (although the listener probably knows John and Mary). So it appears that the traditional logical approach describes just the sentences that start a text/discourse. But there are very few of them compared to the number of all the sentences in the language! What's the point of analyzing only a small fraction of the material?

So, to describe a sentence meaning, you should always pay attention to what the reader/listener knew before conceiving it. Otherwise you just can't call it sentence meaning. Isn't that obvious? Fortunately, there are modern dynamic semantics approaches that seem to understand the problem. It's just a pity that for such a long time it wasn't widely appreciated.

Saturday, December 26, 2009

Natural language programming

Programming languages (PL) are for computers, natural languages (NL) — for humans. Computers execute programs, we execute texts. So perhaps it would be useful for NL processing to look for more inspiration in PL processing? I don't mean syntax which seems much more complicated in NL. I'm talking about semantics and pragmatics.

In NL, semantics is the literal meaning of what's written, and pragmatics is how a human will actually understand it in discourse, what the response will be, which changes will the text cause in the world. There's something similar in PL. Semantics is about how each construction (assignment, addition, loop, etc.) is executed. It's usually found in the language specification and can be formalized. Pragmatics, OTOH, is what the program does, is it quick-sort, tic-tac-toe game or Linux. PL pragmatics explores how the separate statement meanings are working together to get the job done.

PL semantics is clearly defined and expressible in terms of the same PL (assuming it's Turing-complete). There are plenty of interpreters and compilers proving this. At the same time, PL pragmatics is different, it can't be captured by means of any PL. Generally speaking, you can't tell by looking at a program, what it does. You even can't tell if it'll ever stop. Nevertheless, there are static code analysis tools that do capture some pragmatic aspects of a program and they actually help to find some bugs.

So, if we believe Church and Turing, then there are two news for NLP. The good one is that NL semantics can be fully defined in the terms of that NL, by human beings. And the bad one is that you can write tons of papers analyzing the hidden meanings and ideas in Hamlet and never come up with a complete description. It's pragmatics.

Sunday, December 6, 2009

Information structure in music

The Vocal-Master once told me that the more popular and catchier a melody is, the more repetitions one can find in it. Take Memory or My Way or Dancing Queen as an example: almost every phrase is a repetition of one of the previous phrases with some variations. Similarity helps memorizing the song, variation helps not to get bored.

In fact, all the melodies I can think of contain repetition and variation in every phrase. But what strikes me is that it resembles the natural language information structure very much. Repetition is a kind of Topic (old, given, assumed information), and variation resembles Focus (new, highlighted parts of the sentence). A difference is that the music hasn't any devices to express reference except for repetition, while sentence Topic may not be seen in the discourse before. It may be an indirect or anaphoric reference (A phone rang. A nice female voice asked who she's talking to), or a real-world entity that 'everyone knows' (The president visited England). But I still see a great deal similarity. Both music and discourse are developing in time, introducing new themes and information. What was new in one sentence, becomes given in another one; we can repeat what once was a variation.

The music theory, of course, knows a lot of various ways of developing melody. Music even has phrases and sentences, questions and exclamations. Moreover, music and language are processed by the same brain systems and there are theories of syntactic processing in music. It may even have semantics! Seems that it's only me who didn't suspect this language-music relation until recently.

Sunday, November 15, 2009

Construction grammar

The subj (CxG) is exactly what I had in mind. In short:

There's no language-specific unit in the brain, the language ability is due to common cognitive units.

Basic notion in the language is a construction (a template text fragment). No one has defined it but everyone uses.

Both the speaker and the hearer operate in terms of constructions, not words. Unlike other prominent theories.

Children learn specific constructions first, abstracting them into more general rules afterwards.

Lexical items which are often used in similar contexts tend to grammaticalize into new constructions as language evolves.

Construction approach seems very promising to me, though I see at least two weaknesses in it, to which I haven't found an adequate answer so far:

Why learning new languages becomes much harder after puberty. Is it a degradation in common cognitive abilities? What's different with second language learners? Why do Japanese speakers miss 3d singular -s in English ('he like driving')?

CxG accounts only for positive data, and doesn't explain why exactly the ungrammatical samples are so (or not so) ungrammatical. A vague explanation that one gets used to a specific way of expressing an idea and all the other ways are not so habitual hence more difficult to produce/analyze, doesn't satisfy me very much. It may be equally difficult (or easy) from processing perspective. E.g. adjectives in English would have come after the noun with no clear performance disadvantage. Semantics could also be clear, like in 'he go cinema'.

Another explanation could be an Occam's Razor. In all ungrammatical examples I've seen, there is a grammatical counterpart. So, the brain could think, why should the same meaning be produced in this strange way while there's another well-established one, and mark the 'strange way' as an ungrammatical.

The question remains, how to create a parser based on constructions, and how to induce those constructions from a corpus. And, as usual, what that parser should produce as the result.

Monday, November 9, 2009

Communicative fragments by example

Communicative fragments (CF) are presented as overlapping template sequences that cover the text. They consist of fixed parts (e.g. words) and placeholders which may be filled by certain expressions. A limerick example:

There was an Old Man in a pew,
Whose waistcoat was spotted with blue;
But he tore it in pieces,
To give to his nieces,--
That cheerful Old Man in a pew.

The resulting fragments will be:

there was X[NP] in Y[NP] -> Clause
//variables are uppercase
//sample placeholder constraints are in brackets
//NP=Noun Phrase
//substituted template forms a clause
an X[NP, Starts with a vowel] -> NP
old X[NP] -> NP
man -> NP
//one-word fragment
X[NP] in Y[NP] -> NP
//no one has promised that CF will form a tree
//cf. the first fragment
a X[NP, Starts with a consonant] -> NP
pew -> NP
X[NP] whose Y[NP] Z[Clause, finite] -> NP
//V=Verb

//note a non-projective construct
was X[Participle] -> Clause
//a rule for passive
spotted with X[Adj] -> Participle
but X[Clause] -> Clause
X[NP] tore Y[NP] -> Clause
X[=tear] in pieces -> Clause
//any form of 'tear'
X[Clause] to Y[Clause, infinitive] -> Clause

X[=give] to X[NP] -> Clause, finite
his X[NP] -> NP
that X[NP] -> NP
cheerful X[NP] -> NP

That's a way of parsing: considering syntax as a set of CF patterns where every pattern contains at least one lexical entry that triggers the rule. It should be also much easier to extract such a set from a corpus than to induce a typical generative grammar with movements.

It isn't specified if anything can occur between template components assuming that everything can. Hence free word order languages are supported, but English syntax rules are weakened very much, allowing ungrammatical sentences. So there needs to be a tight parser integration constraining the fragments' freedom.

Friday, November 6, 2009

Forget about the typical student job

An announcement on Psychology department notice board.

Monday, November 2, 2009

All syntax problems...

...can be solved by another level of constituency. Except, of course, for the problem of too many layers of constituency.

At least, in Minimalism:

Sunday, October 25, 2009

A solution for John

It appears that I've been too pessimistic claiming that I can't assemble the meaning of 'John has two sisters' from the word meanings. Blackburn&Bos have taught me:

John: λv (v JOHN)
has: λsubj λobj (subj (λs (obj s id)))
two: λn λy λv (v (∃S (|S|=2 & ∀x∊S (n x y))))
sisters: λx λy SISTER(x,y)

Let's evaluate:

(two sisters) = λy λv (v (∃S (|S|=2 & ∀x∊S SISTER(x,y))))
(has John) = λobj (obj JOHN id)
((has John) (two sisters)):
∃S (|S|=2 & ∀x∊S SISTER(x,JOHN))

Note that this semantics also doesn't assume that John has only 2 sisters, so it really looks like an appropriate one.

So, the applicative text structure is restored in its rights. Although I don't much like this kind of solution, because the semantics of 'has' and 'two' are too dependent of the fact that SISTER predicate takes 2 arguments. I'd have to change them both to express 'John has two dogs', thus making 'two' (and every other numeric quantifier) ambiguous in semantics. But it's still better than nothing.

Friday, October 23, 2009

Asking right questions

When I said that semantics of 'John has two sisters' was |{ x | SISTER(x, JOHN) }|=2, I wasn't quite correct. In fact there's nothing in the text that preventing John to have 5 or 42 sisters. It's the Maxim of Quality which may limit the sister count to 2. Being not an absolute rule, this maxim can be easily flouted and the sentence could actually mean that John has more than 2 sisters in a right context.

Things get even more interesting if we just add one word: John has two beautiful sisters. There just isn't a default meaning here! John may have 2 sisters that are beautiful, but he may have 2 beautiful sisters and another 3 who are not so beautiful.

The question is, what should computer do in such situations. Should it apply pragmatic knowledge and disambiguate everything immediately after syntactic analysis using whole context? Or should it maintain an intermediate semantic representation and give it to some pragmatics module who could infer everything from the semantics? I clearly prefer modularization, i.e. the latter possibility. Of course I don't suppose any sequentiality, the modules may run in parallel interactively.

If we separate semantics from pragmatics, the representation problem arises again, even harder now. The semantic structure should be very generic, it should be interpretable in all the ways that were possible with the original text (minus the resolved lexical/syntactic ambiguities). And at the same time there should be no way of understanding it in any other way. If we just replace = with >= in the John has two sisters meaning, the pragmatics module still won't be able to apply the Quality Maxim. Such a meaning could well be produced from John has at least two sisters which is unambiguous with respect to sister count. So it still should be some kind of =2, but in a form open for interpretation. What a format could it be? I don't know. Yet.

Thursday, October 22, 2009

Shallow vs. structural analysis

Let's look at a very simple sentence, namely 'John has two sisters'. I'm now interested in its semantics, or, more precisely, its computer representation. The truth condition is actually very simple, it says that the number of those who happen to be sisters of John equals to 2:

|{ x | SISTER(x, JOHN) }|=2

(let the uppercase letters denote some semantic meaning here).

A question arises, how can we assemble this semantics from meanings of sentence components? The constituent structure for this sentence would be:

[_S[_NPJohn] [_VPhas [_QPtwo sisters]]]

The dependency structure:

John <- has -> two -> sisters

The beloved one, applicative structure:

(has John (two sisters))

Lexical Functional Grammar-style:


| PRED 'has'
|
| SUBJ | PRED 'John'
|
| OBJ | PRED 'sisters'
|     | SPEC | NUM 2

In any of these variants has has two arguments: John and the combined two sisters. So it appears that we should combine the word meanings in this order, getting something like f(HAS, JOHN, g(2, SISTER)). And this formula should somehow be equivalent to |{x | SISTER(x, JOHN)}|=2. The question is, what are f and g? I see no direct structural answer. The best variant I've come to is that we should change the structure, replace it with another one which contains only one predicate:

HAS_N_SISTERS(Who,N)

which would translate to

|{ x | SISTER(x, Who) }|=N

This can be generalized a bit (take sibling instead of sister), but not further. A similar sentence 'John has two dogs' would have a different semantics, e.g. |{x | DOG(x) & BELONGS(x, John)}|=2. A two-place 'sister'-like 'dog' predicate would be funny.

So it seems that all the structures I know of are of no use with this sentence. That's one of the reasons I prefer shallow parsing based on patterns with wildcards: it appears to map better onto semantics. And a probable sad consequence is that the applicative structure, though being so beautiful, will remain unapplied.

Wednesday, October 21, 2009

What Science Underlies Natural Language Engineering?

Quote:

A superficial look at the papers presented in our main conferences reveals that the vast majority of them are engineering papers, discussing engineering solutions to practical problems. Virtually none addresses fundamental issues in linguistics.

Couldn't have said it better.

Tuesday, October 20, 2009

Linear Unit Grammar

Predominant natural language syntax theories usually deal with one sentence, which is carefully selected to be well-formed, so called 'grammatical'. But the live language seems to be different, more complex, especially the spoken one. It contains lots of repetitions, false starts, hesitations, reformulations, speaker changes and so on.

I therefore like meeting theories aiming to describe spoken discourse 'as is' instead of labelling it as an incorrect. One of those theories is Linear Unit Grammar by John Sinclair and Anna Mauranen.

The main notion here is 'chunk'. It's a fuzzy pre-theoretical concept, with no precise definition. Basically it's a linear fragment of input signal (text, speech, etc.) which people tend to comprehend all at once. It may be unfinished. Usually it's formed of closely connected words like a noun phrase with adjectives (a small dog) or verb with its obligatory arguments (to love Mary). Relative clauses, of course, form separate chunks. Moreover, the auxiliary words (like 'that' in 'the dog that loves Mary') are separated from anything else and go to single-word chunks. The chunks are very small, their size rarely exceeds 5 words.

Here's a sample chunking made by me from a spoken English corpus. The analyzed fragment is:

so an example of this s- s- second rule i mean the second rule probably is the easiest one to look at cuz you have the, the f- the the four-six-seven type of relationship

I divide it according to my intuition only. Whenever I have doubts, I put a chunk boundary. And so the result will be:

1. so
2. an example of this
3. s- s-
4. second rule
5. i mean
6. the second rule
7. probably
8. is
9. the easiest one
10. to look at
11. cuz
12. you have the
13. the f-
14. the
15. the four-six-seven type
16. of relationship

Sinclair & Mauranen classify the chunks into organizational fragments (1,5,11) and message fragments (the others). These groups are also divided into subgroups according to organizational function or message completeness. There's a non-deterministic algorithm that translates any kind of text into a well-formed one. In this example it would be something like 'an example of this second rule is probably the easiest one to look at cuz you have the four-six-seven type of relationship'.

That's a bit surprising! How can anyone claim that 'grammatical sentences are unnatural, let's analyze the real ones' and then analyze spoken discourse by first making it grammatical? The answer is that the authors in fact don't aim to compete with major syntactic theories, they strive to co-exist with them, at least in the beginning. The described algorithm may be just a first step in complex language analysis. Authors also suggest chunking could help in second language teaching/learning.

What I personally like about Linear Unit Grammar is precisely the chunking. It's so simple! And, in contrast to Gasparov's approach when the text is divided into communicative fragments, the LUG chunks are contiguous and non-overlapping. Therefore the LUG chunking can be done by simple regular expressions, or Markov processes. A great part of the syntax lies inside chunks so there's no need to analyze it the same way as the 'bigger' structures like clause subordination. NLTK seems to provide chunking out of the box, so I guess I gotta try it.

Saturday, June 6, 2009

Rhyming as an error correction method

In ancient times there was no written language and every story was transferred orally. Of course, when you retell a story you change it in some way because you don't remember the exact wordings, it's only the meaning that matters. Hence after several generation the story could change in an unrecognizable way. Even if a person tries to remember the words themselves, the human memory isn't ideal, so there still will be some changes which may even render some text fragments meaningless.

It's different with poetry. In addition to the plot, here's one more factor which should be preserved on retelling: the rhyme. And with this additional restriction, it's in fact much harder to alter the story to a large extent. Maybe that's why a lot of epic literature is preserved in a poetic form? Just speculations.

Sunday, March 22, 2009

Topic-Focus articulation

As it is known, every sentence can be divided into two logical parts. One (Topic) is what the sentence is about, another (Focus) is what the sentence actually says about its Topic. Topic is usually something given or presupposed to be given, Focus is something new or emphasized. To me the Topic seems like data from the knowledge base whilst the Focus seems like an operation on this data, a function on the Topic. Since Topic-Focus opposition is one of the most important distinctions for me in the text structure, I can't omit it from the functional representation.

We know one thing that can separate anything from anything in functional composition: a Lambda abstraction. So we'll use it also to separate Topic from Focus.

Let's introduce a special auxiliary function (actually it's more like a macro) SENT, which will take both Topic and Focus as its arguments and then apply function-Focus to argument-Topic, probably marking them in some special problem-dependent way. Thus every sentence would look like a call of SENT function with Focus and Topic arguments.

Example sentence:

Father loves son.

Its core functional structure:

(loves father son)

By default Topic is 'father', Focus is 'loves son'.

(SENT (λx loves x son) father)

We can stress FATHER, promoting it to the Focus and 'loves son' - to the Topic:

(SENT (λx x father) (λy loves y son))

Here we have two abstractions. If we suppose that SENT does nothing besides applying its first argument (Focus) to its second argument (Topic), we'll get

((λx x father) (λy loves y son))
<= replace x =>
((λy loves y son) father)
<= replace y =>
(loves father son)

We can stress SON:

(SENT (λx x son) (λy loves father y))
which can be replaced by shorter
(SENT (λx x son) (loves father))

We can stress LOVES:

(SENT loves father son)

Here the Topic consists of two entities rather than one. This is perfectly normal, we'll just apply 'loves' to 'father', and then the result - to 'son'. We'll get ((loves father) son) which is by definition of currying equal to the core structure.

I don't know yet how to represent multiple parallel contrastive sequences, like 'Father loves son, while mother hates him'. But such an approach feels promising to me.

Lambda abstraction in text structure

Just like Chomsky, we'll introduce exactly one transformation. From Lambda calculus it's known like Lambda abstraction. Let's say (λx E1 E2) is an application of function (λx E1) to E2, and the result of this application will be E3 which is E1 with all occurrences of x replaced with E2. Abstraction is a reverse operation. Having a complex expression E3 where E2 occurs, we'll replace the occurrences of E2 with x and append λx to the beginning. Of course, any variable name could be in place of x.

This abstraction may be used when there are several functions applied on one argument. The most frequent example here is predicate coordination:

They stopped and looked across the river.

Here we have both 'stopped' and 'looked' applied to 'they'. Plain functional composition can't apply two functions to one argument simultaneously, so we'll need something that can. Let's suppose that 'and' function does it. It receives several component functions, merges them somehow and produces a single function as a result. This function will be a Composite, meaning that its action on an argument basically would be to apply all components to that argument. Here the merged functions will be 'stopped' and '(λx looked x across (the river))'. The second one will receive the argument and put it into correct position in its argument structure. So the result will be

((and stopped (λx looked x across (the river))) they)

Sepulkarium