When will machines understand our intentions?

Imagine you’re at a new friend’s house for dinner, and the house is stiflingly hot. You feel uncomfortable turning on the AC yourself, so instead, you casually remark: “Boy, it sure is warm in here!” Your friend will probably infer your intentions, and will turn on the AC or open a window.

Now imagine that instead of speaking to your friend, you’re speaking to Alexa, of Amazon Echo fame. Will Alexa be able to understand that by saying “it sure is warm in here”, you actually mean “turn on the air-conditioning”? Unless you’ve hard-coded the meaning of this utterance into your device, Alexa will likely not understand.

How can your friend so easily infer your intended meaning, but a product developed by a $250 billion company cannot?

Current Approaches

Considerable time and money has been invested towards developing machines that can understand our intentions, with at least two distinct approaches: idiomatic and inferential (Wilske and Kruijff, 2006). Consider the case described above, in which you tell Alexa “it sure is warm in here”:

The idiomatic approach involves simply hard-coding the intended meaning of this utterance. Somewhere in Alexa’s code, the words “it sure is warm in here” would be directly translated into an action, like “TURN ON AIR-CONDITIONING”. This approach is easier to implement, but unfortunately does not generalize very well. There are many subtle variations on “it sure is warm in here”, so you’d have to hard-code each of these – and even trickier, there are many different contexts in which that same sentence might mean something different! Perhaps you’re expressing your delight with a toasty fireplace, or perhaps the air-conditioning is broken, and what you really intend is for Alexa to remind you to get it fixed.

The inferential approach involves using the sentence, as well as some other information, to make an inference about what you mean. Inference means using the current information to draw a more informative conclusion. In general, humans are adept at making inferences (particularly in the social realm) but machines struggle with this task. This approach is more difficult than the idiomatic approach, but would theoretically yield better results, which could generalize across sentences, speakers, and situations.

At this point, the question becomes: how can we get machines to make correct inferences about our intentions?

Our Research

One approach to answering this question is to ask another, related question: how do people make these inferences? For example, what factors influence your friend to interpret “it sure is warm in here” as a request to turn on the air-conditioning?

There are at least two hypotheses about how this works, roughly corresponding to the two approaches described above:

  1. Memorization: as with the idiomatic approach, people memorize that “it sure is warm in here” actually means “please turn on the AC”.
  2. Theory of Mind: as with the inferential approach above, people use contextual information – including what they think the speaker believes or wants – to infer the meaning.

Previous research (Gibbs, 1979; Gibbs, 1980; Coulson and Lovett, 2010; Ackeren et al, 2012) found that, as expected, context matters; “it sure is warm here” means something different when you’re walking through a hot desert than when you’re sitting in a friend’s home. However, this doesn’t necessarily rule out the memorization or “idiomatic” strategy. It is still possible – though unlikely – that people memorize the meaning of these requests according to different contexts, as they memorize idioms like “once in a blue moon”.

At the Language and Cognition Lab in UC San Diego, we investigated whether people need to pay attention to individual speaker’s beliefs about the world to interpret something as a request or a normal statement.

To test this, we presented participants with eight different passages describing an interaction with somebody else. The passages ended with a potential request, such as “it’s really cold in here”; in each scenario, there was an obstacle to solving this request, such as a broken heater. Passages were manipulated so that the speaker either did or did not know about this obstacle. Participants were then asked to select a paraphrase of the statement, which was either a more direct request, such as “could you turn on the heater?”, or a statement, such as “it’s really cold; too bad the heater is broken”.

If the memorization hypothesis is correct, the speaker’s knowledge should not affect the participant’s interpretation, and the statements should be interpreted as requests in both conditions. If the Theory of Mind hypothesis correct, a participant will interpret something as a request only if the speaker does not know about the obstacle. After all, why would you make a request if you know the other person couldn’t fulfill it?


Thus far, our results support the Theory of Mind hypothesis. When the speaker knew about the obstacle, participants were less likely to interpret their intentions as a request; when the speaker did not know, the opposite was true.

These results have important implications for researchers developing language interfaces like Siri, Google Home, and Amazon Echo. Our findings suggest that if machines are ever able to understand people’s intentions through language, they will need to be capable of the same types of inferences that people regularly make – which means that they need to understand different contexts, including where they are, who they are speaking to, and what the speaker might want or believe.

One day, perhaps, robots will be able to understand a passive-aggressive remark – “the kitchen’s really dirty…” – as a request. We can only hope that robots don’t learn the art of passive-aggression themselves.


Wilske, Sabrina; Kruijff, G.-J. (2006). Service Robots dealing with indirect speech acts. ReVision.

Gibbs, R. (1980). Your Wish Is My Command: Convention and Context in Interpreting Indirect Speech Acts.

Gibbs, R. W. (1979). Contextual effects in understanding indirect requests∗. Discourse Processes, 2(1), 1–10. http://doi.org/10.1080/01638537909544450

Coulson, S., & Lovett, C. (2010). Comprehension of non-conventional indirect requests: An event-related brain potential study. Italian Journal of Linguistics, 22(1), 107–124.

Ackeren, Markus; Cassanto, Daniel; Bekkering, Harold; Hagoort, Peter; Rueschemeyer, S.-A. (2012). Pragmatics in Action: Indirect Requests Engage Theory of Mind Areas and the Cortical Motor Network. Human Brain Mapping, 33(10), 2322–2333. http://doi.org/10.1002/hbm.21365

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s