|
Jerry Ball, Air Force Research Laboratory
|
||
Multiword Expressions and Constructions Facilitate, Not Hinder, Understanding |
||
|
According to Sag et al. (2002) multiword expressions are a “pain in the neck” for natural language processing. Viewed from the perspective of language processing systems with a strong notion of semantic composibility based on composing the meanings of utterances from the meanings of individual words in accordance with the syntactic structure of the utterance, this might appear to be true. However, from a constructional perspective in which multiword expressions and constructions of varying size are directly paired with meanings, multiword expressions and constructions provide a context that facilitates, rather than hindering, understanding. Once the multiword expression or construction is recognized, the meaning follows. Trying to determine the meaning of high frequency words like “take” and “have” in isolation from the constructions in which they occur (e.g. “take a hike”, “take five”, “take place”, “take a leak”, “take your time”, “have a blast”, “don ‘t have a cow”, “have at it”) is somewhat akin to trying to determine the meaning of an unambiguous word from the meanings of the letters which make it up. Although “take” does have a base meaning (i.e. “to get possession of”), that meaning is not directly relevant to many of the constructions in which it occurs. The reason the most frequently occurring words are the most ambiguous is because they often function as elements of multiword expressions and constructions whose meanings are directly associated with these larger linguistic units. The reason multiword expressions exist is because it isn’t possible to have a separate word to represent each possible concept that can be expressed. Words are the prototype for associating linguistic form and meaning. But words may themselves be composed of multiple morphemes. The meaning of a multi-morphemic word may not be compositional from the meaning of the individual morphemes. Very often, the meaning of the word is more specific than the meanings of the morphemes of which it is composed (e.g. “blackbird”, “bluebird”). These same considerations apply to multiword expressions (e.g. “black ice”, “black listed”), idioms (e.g. “kicked the bucket”), and constructions (e.g. “Subj gave Object a kick”), making a compositional approach to meaning determination infeasible in many cases. Multiword expressions and constructions are directly retrievable from memory and minimize the amount of processing required to determine meaning. There really isn’t time to process spoken input one word at a time. The larger the linguistic unit, the less likely the unit is to be ambiguous. The larger the multiword expression and, for constructions, the more lexically specific, the better the possible match to the spoken input and the less the likelihood for confusion and errors due to noise and degraded input. Of course, it is not possible to store all possible utterances, but given the powerful associative memory with which humans are endowed, storage of large quantifies of previously experienced multiword expressions and constructions is expected and serves to facilitate understanding.
Sag, I., Baldwin, T, Bond, F, Copestake, A. and Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002). Mexico City, Mexico, pp. 1-15.
|
||
![]() |
|