Tutorial

In this section, we will build an NLU assistant for home automation tasks. It will be able to understand queries about lights and thermostats. More precisely, our assistant will contain three intents:

  • turnLightOn
  • turnLightOff
  • setTemperature

The first two intents will be about turning on and off the lights in a specific room. These intents will have one Slot which will be the room. The third intent will let you control the temperature of a specific room. It will have two slots: the roomTemperature and the room.

The first step is to create an appropriate dataset for this task.

Training Data

Check the Training Dataset Format section for more details about the format used to describe the training data.

In this tutorial, we will create our dataset using the YAML format, and create a dataset.yaml file with the following content:

# turnLightOn intent
---
type: intent
name: turnLightOn
slots:
  - name: room
    entity: room
utterances:
  - Turn on the lights in the [room](kitchen)
  - give me some light in the [room](bathroom) please
  - Can you light up the [room](living room) ?
  - switch the [room](bedroom)'s lights on please

# turnLightOff intent
---
type: intent
name: turnLightOff
slots:
  - name: room
    entity: room
utterances:
  - Turn off the lights in the [room](entrance)
  - turn the [room](bathroom)'s light out please
  - switch off the light the [room](kitchen), will you?
  - Switch the [room](bedroom)'s lights off please

# setTemperature intent
---
type: intent
name: setTemperature
slots:
  - name: room
    entity: room
  - name: roomTemperature
    entity: snips/temperature
utterances:
  - Set the temperature to [roomTemperature](19 degrees) in the [room](bedroom)
  - please set the [room](living room)'s temperature to [roomTemperature](twenty two degrees celsius)
  - I want [roomTemperature](75 degrees fahrenheit) in the [room](bathroom) please
  - Can you increase the temperature to [roomTemperature](22 degrees) ?

# room entity
---
type: entity
name: room
automatically_extensible: no
values:
- bedroom
- [living room, main room, lounge]
- [garden, yard, backyard]

Here, we put all the intents and entities in the same file but we could have split them in dedicated files as well.

The setTemperature intent references a roomTemperature slot which relies on the snips/temperature entity. This entity is a builtin entity. It allows to resolve the temperature values properly.

The room entity makes use of synonyms by defining lists like [living room, main room, lounge]. In this case, main room and lounge will point to living room, the first item of the list, which is the reference value.

Besides, this entity is marked as not automatically extensible which means that the NLU will only output values that we have defined and will not try to match other values.

We are now ready to generate our dataset using the CLI:

snips-nlu generate-dataset en dataset.yaml > dataset.json

Note

We used en as the language here but other languages are supported, please check the Supported languages section to know more.

Now that we have our dataset ready, let’s move to the next step which is to create an NLU engine.

The Snips NLU Engine

The main API of Snips NLU is an object called a SnipsNLUEngine. This engine is the one you will train and use for parsing.

The simplest way to create an NLU engine is the following:

from snips_nlu import SnipsNLUEngine

default_engine = SnipsNLUEngine()

In this example the engine was created with default parameters which, in many cases, will be sufficient.

However, in some cases it may be required to tune the engine a bit and provide a customized configuration. Typically, different languages may require different sets of features. You can check the NLUEngineConfig to get more details about what can be configured.

We have built a list of default configurations, one per supported language, that have some language specific enhancements. In this tutorial we will use the english one.

import io
import json

from snips_nlu import SnipsNLUEngine
from snips_nlu.default_configs import CONFIG_EN

engine = SnipsNLUEngine(config=CONFIG_EN)

At this point, we can try to parse something:

engine.parse("Please give me some lights in the entrance !")

That will raise a NotTrained error, as we did not train the engine with the dataset that we created.

Training the engine

In order to use the engine we created, we need to train it or fit it with the dataset we generated earlier:

with io.open("dataset.json") as f:
    dataset = json.load(f)

engine.fit(dataset)

Note that, by default, training of the NLU engine is non-deterministic: training and testing multiple times on the same data may produce different outputs.

Reproducible trainings can be achieved by passing a random seed to the engine:

seed = 42
engine = SnipsNLUEngine(config=CONFIG_EN, random_state=seed)
engine.fit(dataset)

Note

Due to a scikit-learn bug fixed in version 0.21 we can’t guarantee any deterministic behavior if you’re using a Python version <3.5 since scikit-learn>=0.21 is only available starting from Python >=3.5

Parsing

We are now ready to parse:

parsing = engine.parse("Hey, lights on in the lounge !")
print(json.dumps(parsing, indent=2))

You should get the following output (with a slightly different probability value):

{
  "input": "Hey, lights on in the lounge !",
  "intent": {
    "intentName": "turnLightOn",
    "probability": 0.4879843917522865
  },
  "slots": [
    {
      "range": {
        "start": 22,
        "end": 28
      },
      "rawValue": "lounge",
      "value": {
        "kind": "Custom",
        "value": "living room"
      },
      "entity": "room",
      "slotName": "room"
    }
  ]
}

Notice that the lounge slot value points to living room as defined earlier in the entity synonyms of the dataset.

Now, let’s say the intent is already known and provided by the context of the application, but the slots must still be extracted. A second parsing API allows to extract the slots while providing the intent:

parsing = engine.get_slots("Hey, lights on in the lounge !", "turnLightOn")
print(json.dumps(parsing, indent=2))

This will give you only the extracted slots:

[
  {
    "range": {
      "start": 22,
      "end": 28
    },
    "rawValue": "lounge",
    "value": {
      "kind": "Custom",
      "value": "living room"
    },
    "entity": "room",
    "slotName": "room"
  }
]

Finally, there is another method that allows to run only the intent classification and get the list of intents along with their score:

intents = engine.get_intents("Hey, lights on in the lounge !")
print(json.dumps(intents, indent=2))

This should give you something like below:

[
  {
    "intentName": "turnLightOn",
    "probability": 0.6363648460343694
  },
  {
    "intentName": null,
    "probability": 0.2580088944934134
  },
  {
    "intentName": "turnLightOff",
    "probability": 0.22791834836267366
  },
  {
    "intentName": "setTemperature",
    "probability": 0.181781583254962
  }
]

You will notice that the second intent is null. This intent is what we call the None intent and is explained in the next section.

Important

Even though the term "probability" is used here, the values should rather be considered as confidence scores as they do not sum to 1.0.

The None intent

On top of the intents that you have declared in your dataset, the NLU engine generates an implicit intent to cover utterances that does not correspond to any of your intents. We refer to it as the None intent.

The NLU engine is trained to recognize when the input corresponds to the None intent. Here is the kind of output you should get if you try parsing "foo bar" with the engine we previously created:

{
  "input": "foo bar",
  "intent": {
    "intentName": None,
    "probability": 0.552122
  },
  "slots": []
}
{
  "input": "foo bar",
  "intent": {
    "intentName": null,
    "probability": 0.552122
  },
  "slots": []
}

The None intent is represented by a None value in python which translates in JSON into a null value.

Intents Filters

In some cases, you may have some extra information regarding the context in which the parsing occurs, and you may already know that some intents won’t be triggered. To leverage that, you can use intents filters and restrict the parsing output to a given list of intents:

parsing = engine.parse("Hey, lights on in the lounge !",
                        intents=["turnLightOn", "turnLightOff"])

This will improve the accuracy of the predictions, as the NLU engine will exclude the other intents from the classification task.

Persisting

As a final step, we will persist the engine into a directory. That may be useful in various contexts, for instance if you want to train on a machine and parse on another one.

You can persist the engine with the following API:

engine.persist("path/to/directory")

And load it:

loaded_engine = SnipsNLUEngine.from_path("path/to/directory")

loaded_engine.parse("Turn lights on in the bathroom please")

Alternatively, you can persist/load the engine as a bytearray:

engine_bytes = engine.to_byte_array()
loaded_engine = SnipsNLUEngine.from_byte_array(engine_bytes)