The Interpretation of Indirect Speech Acts

In a previous post, I raised the problem of indirect speech acts – utterances in which the literal interpretation differs from the speaker’s intended meaning.

These are interesting for two reasons:

  1. They’re an example of successful people communicating and interpreting nonliteral meaning, and thus would seem to involve the use of some inference mechanism.
  2. As technology grows more interactive and responsive (e.g. robot assistants, smart homes, self-driving cars, etc.), it becomes increasingly important for such technology to correctly infer the intentions of those it interacts with; thus, identifying the mechanism at play is important for building more natural, humanly intelligent machines.

I also discussed several hypotheses for why people speak indirectly. One particularly compelling theory is the obstacle hypothesis (Gibbs, 1986), which argues that people formulate indirect requests to address the possible grounds for refusal of the request. In this way, indirect requests can actually be more efficient, from a communicative standpoint.

Unfortunately, this does not address the question of interpretation: given some utterance, how does a listener determine:

  1. Is it an indirect request?
  2. If it is an indirect request, what is the intended result of the request? That is, what does the speaker want me to do?

There are a couple existing proposals about how this inference mechanism works, which I’ll discuss below.


Standard Pragmatics Model

The standard pragmatics model uses the framework of relevance theory to explain how people infer the intended interpretation of indirect speech acts. Relevance theory (Sperber & Wilson, 1995) argues that listeners derive the correct interpretation of an utterance using several assumptions about how conversations usually go. This harkens back to the cooperative principle of conversation (Grice, 1975); simply put, the cooperative principle holds that – by and large – speakers and listeners say things with the intention of being as clear and relevant as possible. Grice postulated four “maxims” of this principle, but the most relevant[1] maxim is the maxim of relevance, which basically argues that people generally don’t say things unless they think they’re relevant[2] to the conversation.

Using this assumption, listeners can infer that something is an indirect speech act because the literal interpretation would be irrelevant to the conversation. As described in Searle (1975) the listener can derive the intended interpretation by assessing the scope of possible relevant utterances. The example Searle uses is not a request, but rather a refusal. Consider the following exchange:

Speaker X: Let’s go to the movies tonight.

Speaker Y: I have to study for an exam.

How does X determine that Y is rejecting his proposal to go to the movies? Searle describes a series of steps (from X’s point of view), which I’ve abridged below.

  1. In response to my proposal, Y said he must study for an exam (an assertion).
  2. I assume that Y is “cooperating” in the conversation, and a standard cooperative response to a proposal is one of: {acceptance, rejection, counter-proposal, further discussion}
  3. Y’s utterance was not in this set of standard responses, but since Y is being cooperative, he must mean something more than he says – e.g. an indirect speech act.

In these first three steps[3], X has successfully determined that Y made an indirect speech act. Recall that this was the first problem presented in (a) above.

Next, X follows a series of additional steps to derive Y’s actual intention:

  1. Studying for exams takes a lot of time, as does going to the movies; thus, Y probably cannot do both.
  2. In order to accept a proposal, a person must be able to perform the act implicated by the proposal.
  3. Therefore, Y has said something with the consequence that he cannot accept the proposal; therefore, Y has rejected my proposal.

In these steps, X has successfully determined Y’s intended meaning – the problem presented in (b) above[4].

It seems that such a model solves the two problems we started out with. Does that mean we call it a day?

No. Searle’s theories looks quite nice, but theories need to be tested. Broadly speaking, science involves at least the following steps:

  1. Identify a phenomenon.
  2. Develop a theory to explain that phenomenon.
  3. Refine the theory to a model, which makes testable predictions (e.g. hypotheses).
  4. Test those predictions through experiments, observational studies, computational simulations, etc.
  5. Compare your results to the predictions, refine the model further.
  6. Rinse and repeat.

So what predictions do Searle’s model make? One very clear prediction involves timing. His model is a sequential process, involving multiple stages of inference – people first assess the literal interpretation, and then derive the intended, nonliteral interpretation. We also know from neuroscience studies that processing non-literal meanings can be more cognitively taxing than processing literal meanings[5].

Prediction: According to this model, neural measures of processing difficulty should show differences between processing of direct and indirect speech acts towards the end of a sentence, since it is only at the end of a sentence that a listener can assess its relevance and derive the indirect meaning.

Direct Access Model

Another model of indirect speech act interpretation is called the direct access model (Gibbs 1994; Gibbs 2002). In contrast to the relevance model presented above[6], the direct access model argues that often, the listener does not need to perform a sequence of steps to derive the intended interpretation from the literal interpretation. Thus, instead of first assessing the literal interpretation, then assessing its relevance, and then deriving the nonliteral interpretation, a listener can use contextual information to set certain expectations about a speaker’s intentions.

Part of Gibbs’s argument against the standard pragmatics model is that the very notion of literalness is extremely hard to define. If we define “literal” as the compositional meaning of a sentence (the combined meanings of the individual words, absent any contextual information), the literal interpretation of a sentence becomes both highly vague and not very useful. Since all language is heard (or read) within some surrounding context, why should the mind be wired to first process a literal interpretation – sans context – and then integrate context to derive the final interpretation?

Instead, the direct access model posits that “people do not automatically analyze the complete context-free, or literal meanings, of entire utterances before deriving their figurative meanings” (Gibbs, 2002). Gibbs argues that this does not suggest people never interpret literal meanings – nor is it at odds with research suggesting that figurative language is sometimes processed later (or slower) than literal language. It seems natural that poetic metaphors, for example, will take longer to understand than the literal interpretation of the same sentence.

Early support for the direct access model comes from an experimental study done by Gibbs (1979), in which participants were presented with potential indirect requests – such as “Must you open the window?” – with either a story context preceding the request, or as simply an isolated sentence. The experimental manipulation was that the story context could support either an indirect request interpretation (“Do not open the window”) or a literal meaning (“Is it necessary that you open the window?”). After reading the sentence, participants were asked to make a paraphrase judgment for the sentence. In the story context, participants took longer to understand the literal meaning than the indirect meaning; in the isolated context, the opposite effect was observed. Since processing indirect requests can sometimes take a shorter time than processing the literal interpretation – in the appropriate context – Gibbs argues that the listener “need not construct the literal interpretation before deriving the conveyed request”.

In other words, context[7] plays some sort of integral role in a listener’s interpretation of indirect requests; given the right context, it seems that listeners do not need to parse a literal interpretation before deriving the nonliteral, indirect meaning.

However, it is important to recognize that this is simply Gibbs’s interpretation of the data; while the data does show that indirect requests can be processed more quickly than literal meanings in certain situations, that doesn’t necessarily imply that the literal meaning is not computed before deriving the indirect request interpretation. Thus, further testing – and more specific predictions – is needed.

Prediction: According to this model, neural measures of processing difficulty should show differences between processing of direct and indirect speech acts early on, towards the beginning of an utterance.

Experimental Investigation

Fortunately, one experiment (Coulson & Lovett, 2010) tested the predictions of these respective models. Participants were presented with narrative scenarios, each one followed by a target utterance (e.g. “My soup is too cold to eat”); different conditions biased the narrative scenarios such that the target utterance would be interpreted as either a literal statement, or as an indirect request. Each word of the target utterance was presented on the screen independently (for about 200 ms), so researchers could compare the sentence stage (e.g. word 2 vs. word 5) to other measures (see below). After the target utterance, participants were presented with a “comprehension probe”, which continued the narrative in either an expected or unexpected way – depending on the bias of the original narrative.

While participants were reading these stories and performing the task, data about their neural activity was being collected from electrodes placed on their scalp. This methodology, called electroencephalography (EEG), is very useful for gathering information about patterns of activity across time, since the measurements are temporally very precise; although EEG has poor spatial resolution (e.g. telling you where something is happening in the brain), it is very good at telling you what the overall patterns of activity are at any given slice of time. This makes it ideal for experiments involving language processing, which happens very fast.


Recall our two predictions from the respective models:

  1. Standard pragmatics model: neural activity for the indirect and literal conditions should be similar until the end of the utterance, when the indirect interpretation is derived from the literal meaning.
  2. Direct access model: neural activity for the indirect and literal conditions should be different early on, because context allows the listener to bypass the literal interpretation.

The sentence stage was time-locked with the results from the EEG data, so researchers could look at brain activation during the onset of specific words. An analysis of the activation patterns revealed significant differences in neural activity by word 2 (e.g. soup in “This soup is too hot to eat”), depending on the condition. That is, the story context supporting either the literal interpretation or the indirect interpretation resulted in different activation patterns by the second word in the sentence.

Like Gibbs’s study above, this supports the direct access model. Processing indirect requests looks different in the brain early on in the sentence’s presentation; thus, it seems unlikely that a listener first constructs the complete literal interpretation, and then derives the indirect meaning.


The proposals presented above were two theories that attempted to explain how people understand indirect speech acts. One supported a sequential processing model, and the other supported a model in which the indirect meaning can be accessed before the literal meaning, depending on contextual information. Two studies, using either reaction time (Gibbs 1979) or EEG data about neural activity (Coulson & Lovett, 2010), support the latter view, called the direct access model. Concisely, context sets expectations about speaker intentions.

While this addresses our initial questions, it unfortunately does not fully answer them. Adjudicating between these two models is only somewhat helpful. The direct access model – though more plausible than the standard pragmatics model, based on evidence from psycholinguistics and neuroscience – still does not explain how context is integrated to set expectations. And even Searle’s model, which had a clear set of steps, does not offer a mechanistic explanation for how the final step: how does the listener connect background knowledge with schematized knowledge about speech acts to derive the final interpretation (e.g. “Therefore, Y has said something to indicate he cannot accept the proposal”)?

My stance on the matter is that this question is actually orthogonal to the two models presented here. These models make clear (and testable) predictions about the timing of indirect speech act interpretation, but do not offer detailed explanations for the inference mechanism – the question of how. A detailed explanation is needed if we wish to build a computational model of how humans interpret indirect speech acts. Fortunately, we are now in a position to ask more specific questions:

  1. What, specifically, are the contextual factors that influence the interpretation of an utterance as either a direct or indirect utterance?
  2. What are the different “domains” of these contextual factors – e.g. the setting, previous utterances in the conversation, the relationship to the speaker, etc. – and is each domain equally important in the analysis?
  3. Given an utterance and some operationalization of context, what is a computational account of the inference mechanism required to infer the meaning of the utterance?


Levinson, S. Pragmatics. 1983. Cambridge Textbooks in Linguistics.

Gibbs, R. W. (1979). Contextual effects in understanding indirect requests∗. Discourse Processes, 2(1), 1–10.

Gibbs, R. W. (1986). What makes some indirect speech acts conventional? Journal of Memory and Language, 25(2), 181–196.

GIBBS Raymond W. 1994. The poetics of mind: Figurative thought, language, and understanding. New York: Cambridge University Press

Gibbs, R. W. (2002). A new look at literal meaning in understanding what is said and implicated. Journal of Pragmatics, 34(4), 457–486.

Sperber, Dan/Wilson, Deirdre (1995): Relevance: Communication and Cognition, Second Edition, Oxford/Cambridge: Blackwell Publishers, pp. 2–9

Grice, Paul (1975). “Logic and conversation”. In Cole, P.; Morgan, J. Syntax and semantics. 3: Speech acts. New York: Academic Press. pp. 41–58

Searle, J. (1975). Indirect Speech Acts. In Speech Acts: An Essay in the Philosophy of Language (pp. 6–6).

[1] Ha.

[2] There are all sorts of hairy issues that arise when trying to define “relevance” more precisely, as Grice himself admits. Any given utterance might be considered irrelevant under a certain dimension, but could be highly relevant under another dimension. How do speakers/listeners decide which dimensions are most relevant to assess a given utterance’s relevance? However, it is a rather tidy theory – it’s been used to explain many phenomena in pragmatics – and there is something quite intuitive (and almost tautological) about the argument that people don’t say something unless they think it’s relevant.

[3] Abridged from Searle’s five for the sake in brevity. Note also that I’ve omitted Searle’s classification of the different steps (partly because I’ve merged some of them). In Searle’s model, different steps involve different mechanisms; certain steps involve assessing factual background information, others involve inference performed on this information, and still others involve having schematized knowledge about speech acts in general (e.g. having an internal model of the cooperative principle).

[4] Note that Searle explicitly mentions that this conclusion is inherently probabilistic (which I think is correct). Thus, a computational implementation of Searle’s model would likely assign more weight to the REJECTION interpretation, but not entirely eliminate the ACCEPTANCE and COUNTER-PROPOSAL interpretations. After all, as Searle points out, Y could make a follow-up statement: “I have to study for an exam, but let’s go to the movies anyway.”

[5] Caveat: this is not always the case, as we shall see later on. Gibbs (2002) describes how figurative or nonliteral meanings of utterances can sometimes be processed more quickly than the literal uses in the same context; the conclusions essentially, is that context is king.

[6] Also sometimes called the Standard Pragmatics model.

[7] Note that “context” is another one of these intrinsically hairy words, which is incredibly hard both to operationalize and to define precisely. In experiments, context is usually operationalized as a story giving some sort of background information or “setting a scene”. But “context” can also refer to previous utterances in a discourse (as opposed to just the milieu or environment), the relationship between the speaker and the listener, and more.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s