Construction grammar

The subj (CxG) is exactly what I had in mind. In short:

  • There's no language-specific unit in the brain, the language ability is due to common cognitive units.
  • Basic notion in the language is a construction (a template text fragment). No one has defined it but everyone uses.
  • Both the speaker and the hearer operate in terms of constructions, not words. Unlike other prominent theories.
  • Children learn specific constructions first, abstracting them into more general rules afterwards.
  • Lexical items which are often used in similar contexts tend to grammaticalize into new constructions as language evolves.

    Construction approach seems very promising to me, though I see at least two weaknesses in it, to which I haven't found an adequate answer so far:

  • Why learning new languages becomes much harder after puberty. Is it a degradation in common cognitive abilities? What's different with second language learners? Why do Japanese speakers miss 3d singular -s in English ('he like driving')?

  • CxG accounts only for positive data, and doesn't explain why exactly the ungrammatical samples are so (or not so) ungrammatical. A vague explanation that one gets used to a specific way of expressing an idea and all the other ways are not so habitual hence more difficult to produce/analyze, doesn't satisfy me very much. It may be equally difficult (or easy) from processing perspective. E.g. adjectives in English would have come after the noun with no clear performance disadvantage. Semantics could also be clear, like in 'he go cinema'.

    Another explanation could be an Occam's Razor. In all ungrammatical examples I've seen, there is a grammatical counterpart. So, the brain could think, why should the same meaning be produced in this strange way while there's another well-established one, and mark the 'strange way' as an ungrammatical.

  • The question remains, how to create a parser based on constructions, and how to induce those constructions from a corpus. And, as usual, what that parser should produce as the result.

    Communicative fragments by example

    Communicative fragments (CF) are presented as overlapping template sequences that cover the text. They consist of fixed parts (e.g. words) and placeholders which may be filled by certain expressions. A limerick example:

    There was an Old Man in a pew,
    Whose waistcoat was spotted with blue;
    But he tore it in pieces,
    To give to his nieces,--
    That cheerful Old Man in a pew.

    The resulting fragments will be:

    there was X[NP] in Y[NP] -> Clause
    //variables are uppercase
    //sample placeholder constraints are in brackets
    //NP=Noun Phrase
    //substituted template forms a clause

    an X[NP, Starts with a vowel] -> NP
    old X[NP] -> NP
    man -> NP
    //one-word fragment
    X[NP] in Y[NP] -> NP
    //no one has promised that CF will form a tree
    //cf. the first fragment

    a X[NP, Starts with a consonant] -> NP
    pew -> NP
    X[NP] whose Y[NP] Z[Clause, finite] -> NP
    //note a non-projective construct
    was X[Participle] -> Clause
    //a rule for passive
    spotted with X[Adj] -> Participle
    but X[Clause] -> Clause
    X[NP] tore Y[NP] -> Clause
    X[=tear] in pieces -> Clause
    //any form of 'tear'
    X[Clause] to Y[Clause, infinitive] -> Clause
    X[=give] to X[NP] -> Clause, finite
    his X[NP] -> NP
    that X[NP] -> NP
    cheerful X[NP] -> NP

    That's a way of parsing: considering syntax as a set of CF patterns where every pattern contains at least one lexical entry that triggers the rule. It should be also much easier to extract such a set from a corpus than to induce a typical generative grammar with movements.

    It isn't specified if anything can occur between template components assuming that everything can. Hence free word order languages are supported, but English syntax rules are weakened very much, allowing ungrammatical sentences. So there needs to be a tight parser integration constraining the fragments' freedom.

    All syntax problems...

    ...can be solved by another level of constituency. Except, of course, for the problem of too many layers of constituency.

    At least, in Minimalism: