API reference

This part of the documentation covers the most important interfaces of the Snips NLU package.

Resources

load_resources(name)

Load language specific resources

Parameters:name (str) – Resource name as in snips-nlu download <name>. Can also be the name of a python package or a directory path.

Note

Language resources must be loaded before fitting or parsing

NLU engine

class SnipsNLUEngine(config=None, **shared)

Main class to use for intent parsing

A SnipsNLUEngine relies on a list of IntentParser object to parse intents, by calling them successively using the first positive output.

With the default parameters, it will use the two following intent parsers in this order:

The logic behind is to first use a conservative parser which has a very good precision while its recall is modest, so simple patterns will be caught, and then fallback on a second parser which is machine-learning based and will be able to parse unseen utterances while ensuring a good precision and recall.

The NLU engine can be configured by passing a NLUEngineConfig

config_type

alias of snips_nlu.pipeline.configs.nlu_engine.NLUEngineConfig

intent_parsers = None

list of IntentParser

fitted

Whether or not the nlu engine has already been fitted

fit(**kwargs)

Fit the NLU engine

Parameters:
  • dataset (dict) – A valid Snips dataset
  • force_retrain (bool, optional) – If False, will not retrain intent parsers when they are already fitted. Default to True.
Returns:

The same object, trained.

parse(**kwargs)

Performs intent parsing on the provided text by calling its intent parsers successively

Parameters:
  • text (str) – Input
  • intents (str or list of str) – If provided, reduces the scope of intent parsing to the provided list of intents
Returns:

The most likely intent along with the extracted slots. See parsing_result() for the output format.

Return type:

dict

Raises:
  • NotTrained – When the nlu engine is not fitted
  • TypeError – When input type is not unicode
persist(path, *args, **kwargs)

Persist the NLU engine at the given directory path

Parameters:path (str) – the location at which the nlu engine must be persisted. This path must not exist when calling this function.
classmethod from_path(path, **shared)

Load a SnipsNLUEngine instance from a directory path

The data at the given path must have been generated using persist()

Parameters:path (str) – The path where the nlu engine is stored.

Intent Parser

class IntentParser(config, **shared)

Abstraction which performs intent parsing

A custom intent parser must inherit this class to be used in a SnipsNLUEngine

fit(dataset, force_retrain)

Fit the intent parser with a valid Snips dataset

Parameters:
  • dataset (dict) – Valid Snips NLU dataset
  • force_retrain (bool) – Specify whether or not sub units of the
  • parser that may be already trained should be retrained (intent) –
parse(text, intents)

Performs intent parsing on the provide text

Parameters:
  • text (str) – Input
  • intents (str or list of str) – If provided, reduces the scope of
  • parsing to the provided list of intents (intent) –
Returns:

The most likely intent along with the extracted slots. See parsing_result() for the output format.

Return type:

dict

class DeterministicIntentParser(config=None, **shared)

Intent parser using pattern matching in a deterministic manner

This intent parser is very strict by nature, and tends to have a very good precision but a low recall. For this reason, it is interesting to use it first before potentially falling back to another parser.

The deterministic intent parser can be configured by passing a DeterministicIntentParserConfig

config_type

alias of snips_nlu.pipeline.configs.intent_parser.DeterministicIntentParserConfig

patterns

Dictionary of patterns per intent

fitted

Whether or not the intent parser has already been trained

fit(**kwargs)

Fit the intent parser with a valid Snips dataset

parse(**kwargs)

Performs intent parsing on the provided text

Intent and slots are extracted simultaneously through pattern matching

Parameters:
  • text (str) – Input
  • intents (str or list of str) – If provided, reduces the scope of
  • parsing to the provided list of intents (intent) –
Returns:

The matched intent, if any, along with the extracted slots. See parsing_result() for the output format.

Return type:

dict

Raises:

NotTrained – When the intent parser is not fitted

persist(path, *args, **kwargs)

Persist the object at the given path

classmethod from_path(path, **shared)

Load a DeterministicIntentParser instance from a path

The data at the given path must have been generated using persist()

to_dict()

Returns a json-serializable dict

classmethod from_dict(unit_dict, **shared)

Creates a DeterministicIntentParser instance from a dict

The dict must have been generated with to_dict()

class ProbabilisticIntentParser(config=None, **shared)

Intent parser which consists in two steps: intent classification then slot filling

The probabilistic intent parser can be configured by passing a ProbabilisticIntentParserConfig

config_type

alias of snips_nlu.pipeline.configs.intent_parser.ProbabilisticIntentParserConfig

fitted

Whether or not the intent parser has already been fitted

fit(**kwargs)

Fit the slot filler

Parameters:
  • dataset (dict) – A valid Snips dataset
  • force_retrain (bool, optional) – If False, will not retrain intent classifier and slot fillers when they are already fitted. Default to True.
Returns:

The same instance, trained

Return type:

ProbabilisticIntentParser

parse(**kwargs)

Performs intent parsing on the provided text by first classifying the intent and then using the correspond slot filler to extract slots

Parameters:
  • text (str) – Input
  • intents (str or list of str) – If provided, reduces the scope of intent parsing to the provided list of intents
Returns:

The most likely intent along with the extracted slots. See parsing_result() for the output format.

Return type:

dict

Raises:

NotTrained – When the intent parser is not fitted

persist(path, *args, **kwargs)

Persist the object at the given path

classmethod from_path(path, **shared)

Load a ProbabilisticIntentParser instance from a path

The data at the given path must have been generated using persist()

Intent Classifier

class IntentClassifier(config, **shared)

Abstraction which performs intent classification

A custom intent classifier must inherit this class to be used in a ProbabilisticIntentParser

fit(dataset)

Fit the intent classifier with a valid Snips dataset

get_intent(text, intents_filter)

Performs intent classification on the provided text

Parameters:
  • text (str) – Input
  • intents_filter (str or list of str) – When defined, it will find the most likely intent among the list, otherwise it will use the whole list of intents defined in the dataset
Returns:

The most likely intent along with its probability or None if no intent was found. See intent_classification_result() for the output format.

Return type:

dict or None

class LogRegIntentClassifier(config=None, **shared)

Intent classifier which uses a Logistic Regression underneath

The LogReg intent classifier can be configured by passing a LogRegIntentClassifierConfig

config_type

alias of snips_nlu.pipeline.configs.intent_classifier.LogRegIntentClassifierConfig

fitted

Whether or not the intent classifier has already been fitted

fit(**kwargs)

Fit the intent classifier with a valid Snips dataset

Returns:The same instance, trained
Return type:LogRegIntentClassifier
get_intent(*args, **kwargs)

Performs intent classification on the provided text

Parameters:
  • text (str) – Input
  • intents_filter (str or list of str) – When defined, it will find the most likely intent among the list, otherwise it will use the whole list of intents defined in the dataset
Returns:

The most likely intent along with its probability or None if no intent was found

Return type:

dict or None

Raises:

NotTrained – When the intent classifier is not fitted

persist(path, *args, **kwargs)

Persist the object at the given path

classmethod from_path(path, **shared)

Load a LogRegIntentClassifier instance from a path

The data at the given path must have been generated using persist()

classmethod from_dict(unit_dict, **shared)

Creates a LogRegIntentClassifier instance from a dict

The dict must have been generated with to_dict()

to_dict()

Returns a json-serializable dict

Slot Filler

class SlotFiller(config, **shared)

Abstraction which performs slot filling

A custom slot filler must inherit this class to be used in a ProbabilisticIntentParser

fit(dataset, intent)

Fit the slot filler with a valid Snips dataset

get_slots(text)

Performs slot extraction (slot filling) on the provided text

Returns:
The list of extracted slots. See
unresolved_slot() for the output format of a slot
Return type:list of dict
class CRFSlotFiller(config=None, **shared)

Slot filler which uses Linear-Chain Conditional Random Fields underneath

Check https://en.wikipedia.org/wiki/Conditional_random_field to learn more about CRFs

The CRF slot filler can be configured by passing a CRFSlotFillerConfig

config_type

alias of snips_nlu.pipeline.configs.slot_filler.CRFSlotFillerConfig

features

List of Feature used by the CRF

labels

List of CRF labels

These labels differ from the slot names as they contain an additional prefix which depends on the TaggingScheme that is used (BIO by default).

fitted

Whether or not the slot filler has already been fitted

fit(**kwargs)

Fit the slot filler

Parameters:
  • dataset (dict) – A valid Snips dataset
  • intent (str) – The specific intent of the dataset to train the slot filler on
Returns:

The same instance, trained

Return type:

CRFSlotFiller

get_slots(*args, **kwargs)

Extracts slots from the provided text

Returns:The list of extracted slots
Return type:list of dict
Raises:NotTrained – When the slot filler is not fitted
compute_features(tokens, drop_out=False)

Compute features on the provided tokens

The drop_out parameters allows to activate drop out on features that have a positive drop out ratio. This should only be used during training.

get_sequence_probability(*args, **kwargs)

Gives the joint probability of a sequence of tokens and CRF labels

Parameters:
  • tokens (list of Token) – list of tokens
  • labels (list of str) – CRF labels with their tagging scheme prefix (“B-color”, “I-color”, “O”, etc)

Note

The absolute value returned here is generally not very useful, however it can be used to compare a sequence of labels relatively to another one.

log_weights(*args, **kwargs)

Return a logs for both the label-to-label and label-to-features weights

persist(path, *args, **kwargs)

Persist the object at the given path

classmethod from_path(path, **shared)

Load a CRFSlotFiller instance from a path

The data at the given path must have been generated using persist()

Feature

class Feature(base_name, func, offset=0, drop_out=0)

CRF Feature which is used by CRFSlotFiller

base_name

str – Feature name (e.g. ‘is_digit’, ‘is_first’ etc)

func

function – The actual feature function for example:

def is_first(tokens, token_index):
return “1” if token_index == 0 else None
offset

int, optional – Token offset to consider when computing the feature (e.g -1 for computing the feature on the previous word)

drop_out

float, optional – Drop out to use when computing the feature during training

Note

The easiest way to add additional features to the existing ones is to create a CRFFeatureFactory

Feature Factories

class CRFFeatureFactory(factory_config)

Abstraction to implement to build CRF features

A CRFFeatureFactory is initialized with a dict which describes the feature, it must contains the three following keys:

  • ‘factory_name’
  • ‘args’: the parameters of the feature, if any
  • ‘offsets’: the offsets to consider when using the feature in the CRF. An empty list corresponds to no feature.

In addition, a ‘drop_out’ to use during train time can be specified.

fit(dataset, intent)

Fit the factory, if needed, with the provided dataset and intent

build_features(builtin_entity_parser, custom_entity_parser)

Build a list of Feature

class SingleFeatureFactory(factory_config)

A CRF feature factory which produces only one feature

class IsDigitFactory(factory_config)

Feature: is the considered token a digit?

class IsFirstFactory(factory_config)

Feature: is the considered token the first in the input?

class IsLastFactory(factory_config)

Feature: is the considered token the last in the input?

class PrefixFactory(factory_config)

Feature: a prefix of the considered token

This feature has one parameter, prefix_size, which specifies the size of the prefix

class SuffixFactory(factory_config)

Feature: a suffix of the considered token

This feature has one parameter, suffix_size, which specifies the size of the suffix

class LengthFactory(factory_config)

Feature: the length (characters) of the considered token

class NgramFactory(factory_config)

Feature: the n-gram consisting of the considered token and potentially the following ones

This feature has several parameters:

  • ‘n’ (int): Corresponds to the size of the n-gram. n=1 corresponds to a unigram, n=2 is a bigram etc
  • ‘use_stemming’ (bool): Whether or not to stem the n-gram
  • ‘common_words_gazetteer_name’ (str, optional): If defined, use a gazetteer of common words and replace out-of-corpus ngram with the alias ‘rare_word’
class ShapeNgramFactory(factory_config)

Feature: the shape of the n-gram consisting of the considered token and potentially the following ones

This feature has one parameters, n, which corresponds to the size of the n-gram.

Possible types of shape are:

  • ‘xxx’ -> lowercased
  • ‘Xxx’ -> Capitalized
  • ‘XXX’ -> UPPERCASED
  • ‘xX’ -> None of the above
class WordClusterFactory(factory_config)

Feature: The cluster which the considered token belongs to, if any

This feature has several parameters:

  • ‘cluster_name’ (str): the name of the word cluster to use
  • ‘use_stemming’ (bool): whether or not to stem the token before looking for its cluster

Typical words clusters are the Brown Clusters in which words are clustered into a binary tree resulting in clusters of the form ‘100111001’ See https://en.wikipedia.org/wiki/Brown_clustering

class CustomEntityMatchFactory(factory_config)

Features: does the considered token belongs to the values of one of the entities in the training dataset

This factory builds as many features as there are entities in the dataset, one per entity.

It has the following parameters:

  • ‘use_stemming’ (bool): whether or not to stem the token before looking for it among the (stemmed) entity values
  • ‘tagging_scheme_code’ (int): Represents a TaggingScheme. This allows to give more information about the match.
class BuiltinEntityMatchFactory(factory_config)

Features: is the considered token part of a builtin entity such as a date, a temperature etc

This factory builds as many features as there are builtin entities available in the considered language.

It has one parameter, tagging_scheme_code, which represents a TaggingScheme. This allows to give more information about the match.

get_feature_factory(factory_config)

Retrieve the CRFFeatureFactory corresponding the provided config

Configurations

class NLUEngineConfig(intent_parsers_configs=None)

Configuration of a SnipsNLUEngine object

Parameters:intent_parsers_configs (list) – List of intent parser configs (ProcessingUnitConfig). The order in the list determines the order in which each parser will be called by the nlu engine.
class DeterministicIntentParserConfig(max_queries=100, max_pattern_length=1000)

Configuration of a DeterministicIntentParser

Parameters:
  • max_queries (int, optional) – Maximum number of regex patterns per intent. 50 by default.
  • max_pattern_length (int, optional) – Maximum length of regex patterns.

This allows to deactivate the usage of regular expression when they are too big to avoid explosion in time and memory

Note

In the future, a FST will be used instead of regexps, removing the need for all this

class ProbabilisticIntentParserConfig(intent_classifier_config=None, slot_filler_config=None)

Configuration of a ProbabilisticIntentParser object

Parameters:
  • intent_classifier_config (ProcessingUnitConfig) – The configuration of the underlying intent classifier, by default it uses a LogRegIntentClassifierConfig
  • slot_filler_config (ProcessingUnitConfig) – The configuration that will be used for the underlying slot fillers, by default it uses a CRFSlotFillerConfig
class LogRegIntentClassifierConfig(data_augmentation_config=None, featurizer_config=None, random_seed=None)

Configuration of a LogRegIntentClassifier

Parameters:
  • data_augmentation_config (IntentClassifierDataAugmentationConfig) – Defines the strategy of the underlying data augmentation
  • featurizer_config (FeaturizerConfig) – Configuration of the Featurizer used underneath
  • random_seed (int, optional) – Allows to fix the seed ot have reproducible trainings
class CRFSlotFillerConfig(feature_factory_configs=None, tagging_scheme=None, crf_args=None, data_augmentation_config=None, random_seed=None)

Configuration of a CRFSlotFiller

Parameters:
  • feature_factory_configs (list, optional) – List of configurations that specify the list of CRFFeatureFactory to use with the CRF
  • tagging_scheme (TaggingScheme, optional) – Tagging scheme to use to enrich CRF labels (default=BIO)
  • crf_args (dict, optional) – Allow to overwrite the parameters of the CRF defined in sklearn_crfsuite, see sklearn_crfsuite.CRF (default={“c1”: .1, “c2”: .1, “algorithm”: “lbfgs”})
  • data_augmentation_config (dict or SlotFillerDataAugmentationConfig, optional) – Specify how to augment data before training the CRF, see the corresponding config object for more details.
  • random_seed (int, optional) – Specify to make the CRF training deterministic and reproducible (default=None)

Dataset

class Dataset(language, intents, entities)

Dataset used in the main NLU training API

Consists of intents and entities data. This object can be built either from text files (Dataset.from_files()) or from YAML files (Dataset.from_yaml_files()).

language

str – language of the intents

intents

list of Intent – intents data

entities

list of Entity – entities data

classmethod from_yaml_files(language, filenames)

Creates a Dataset from a language and a list of YAML files containing intents and entities data

Each file need not correspond to a single entity nor intent. They can consist in several entities and intents merged together in a single file.

A dataset can be defined with a YAML document following the schema illustrated in the example below:

# searchFlight Intent
---
type: intent
name: searchFlight
slots:
  - name: origin
    entity: city
  - name: destination
    entity: city
  - name: date
    entity: snips/datetime
utterances:
  - find me a flight from [origin](Paris) to [destination](New York)
  - I need a flight leaving [date](this weekend) to [destination](Berlin)
  - show me flights to go to [destination](new york) leaving [date](this evening)

# City Entity
---
type: entity
name: city
values:
  - london
  - [new york, big apple]
  - [paris, city of lights]
Raises:
  • DatasetFormatError – When one of the documents present in the YAML files has a wrong ‘type’ attribute, which is not ‘entity’ nor ‘intent’
  • IntentFormatError – When the YAML document of an intent does not correspond to the expected intent format
  • EntityFormatError – When the YAML document of an entity does not correspond to the expected entity format
classmethod from_files(**kwargs)

Creates a Dataset from a language and a list of intent and entity files

Parameters:
  • language (str) – language of the assistant
  • filenames (list of str) – Intent and entity files. The assistant will associate each intent file to an intent, and each entity file to an entity. For instance, the intent file ‘intent_setTemperature.txt’ will correspond to the intent ‘setTemperature’, and the entity file ‘entity_room.txt’ will correspond to the entity ‘room’.

Deprecated since version 0.18.0: This will be removed in 0.19.0. Use from_yaml_files instead

json

Dataset data in json format

class Intent(intent_name, utterances, slot_mapping=None)

Intent data of a Dataset

intent_name

str – name of the intent

utterances

list of IntentUtterance – annotated intent utterances

slot_mapping

dict – mapping between slot names and entities

classmethod from_yaml(yaml_dict)

Build an Intent from its YAML definition dict

An intent can be defined with a YAML document following the schema illustrated in the example below:

# searchFlight Intent
---
type: intent
name: searchFlight
slots:
  - name: origin
    entity: city
  - name: destination
    entity: city
  - name: date
    entity: snips/datetime
utterances:
  - find me a flight from [origin](Paris) to [destination](New York)
  - I need a flight leaving [date](this weekend) to [destination](Berlin)
  - show me flights to go to [destination](new york) leaving [date](this evening)
Raises:IntentFormatError – When the YAML dict does not correspond to the expected intent format
classmethod from_file(**kwargs)

Build an Intent from a text file

Deprecated since version 0.18.0: This will be removed in 0.19.0. Use from_yaml instead

json

Intent data in json format

class Entity(name, utterances=None, automatically_extensible=True, use_synonyms=True, matching_strictness=1.0)

Entity data of a Dataset

This class can represents both a custom or a builtin entity. When the entity is a builtin one, only the name attribute is relevant.

name

str – name of the entity

utterances

list of EntityUtterance – entity utterances (only for custom entities)

automatically_extensible

bool – whether or not the entity can be extended to values not present in the data (only for custom entities)

use_synonyms

bool – whether or not to map entity values using synonyms (only for custom entities)

matching_strictness

float – controls the matching strictness of the entity (only for custom entities). Must be between 0.0 and 1.0.

classmethod from_yaml(yaml_dict)

Build an Entity from its YAML definition dict

An entity can be defined with a YAML document following the schema illustrated in the example below:

# City Entity
---
type: entity
name: city
automatically_extensible: false # default value is true
use_synonyms: false # default value is true
matching_strictness: 0.8 # default value is 1.0
values:
  - london
  - [new york, big apple]
  - [paris, city of lights]
Raises:EntityFormatError – When the YAML dict does not correspond to the expected entity format
classmethod from_file(**kwargs)

Build an Entity from a text file

Deprecated since version 0.18.0: This will be removed in 0.19.0. Use from_yaml instead

json

Returns the entity in json format

Result and output format

intent_classification_result(intent_name, probability)

Creates an intent classification result to be returned by IntentClassifier.get_intent()

Example

>>> intent_classification_result("GetWeather", 0.93)
{'intentName': 'GetWeather', 'probability': 0.93}
unresolved_slot(match_range, value, entity, slot_name)

Creates an internal slot yet to be resolved

Example

>>> import json
>>> slot = unresolved_slot([0, 8], "tomorrow", "snips/datetime",             "startDate")
>>> print(json.dumps(slot, indent=4, sort_keys=True))
{
    "entity": "snips/datetime",
    "range": {
        "end": 8,
        "start": 0
    },
    "slotName": "startDate",
    "value": "tomorrow"
}
custom_slot(internal_slot, resolved_value=None)

Creates a custom slot with resolved_value being the reference value of the slot

Example

>>> s = unresolved_slot([10, 19], "earl grey", "beverage", "beverage")
>>> import json
>>> print(json.dumps(custom_slot(s, "tea"), indent=4, sort_keys=True))
{
    "entity": "beverage",
    "range": {
        "end": 19,
        "start": 10
    },
    "rawValue": "earl grey",
    "slotName": "beverage",
    "value": {
        "kind": "Custom",
        "value": "tea"
    }
}
builtin_slot(internal_slot, resolved_value)

Creates a builtin slot with resolved_value being the resolved value of the slot

Example

>>> rng = [10, 32]
>>> raw_value = "twenty degrees celsius"
>>> entity = "snips/temperature"
>>> slot_name = "beverageTemperature"
>>> s = unresolved_slot(rng, raw_value, entity, slot_name)
>>> resolved = {
...     "kind": "Temperature",
...     "value": 20,
...     "unit": "celsius"
... }
>>> import json
>>> print(json.dumps(builtin_slot(s, resolved), indent=4))
{
    "range": {
        "start": 10,
        "end": 32
    },
    "rawValue": "twenty degrees celsius",
    "value": {
        "kind": "Temperature",
        "value": 20,
        "unit": "celsius"
    },
    "entity": "snips/temperature",
    "slotName": "beverageTemperature"
}
resolved_slot(match_range, raw_value, resolved_value, entity, slot_name)

Creates a resolved slot

Parameters:
  • match_range (dict) – Range of the slot within the sentence (ex: {“start”: 3, “end”: 10})
  • raw_value (str) – Slot value as it appears in the sentence
  • resolved_value (dict) – Resolved value of the slot
  • entity (str) – Entity which the slot belongs to
  • slot_name (str) – Slot type
Returns:

The resolved slot

Return type:

dict

Example

>>> resolved_value = {
...     "kind": "Temperature",
...     "value": 20,
...     "unit": "celsius"
... }
>>> slot = resolved_slot({"start": 10, "end": 19}, "earl grey",
... resolved_value, "beverage", "beverage")
>>> import json
>>> print(json.dumps(slot, indent=4, sort_keys=True))
{
    "entity": "beverage",
    "range": {
        "end": 19,
        "start": 10
    },
    "rawValue": "earl grey",
    "slotName": "beverage",
    "value": {
        "kind": "Temperature",
        "unit": "celsius",
        "value": 20
    }
}
parsing_result(input, intent, slots)

Create the final output of SnipsNLUEngine.parse() or IntentParser.parse()

Example

>>> text = "Hello Bill!"
>>> intent_result = intent_classification_result("Greeting", 0.95)
>>> internal_slot = unresolved_slot([6, 10], "Bill", "name",
... "greetee")
>>> slots = [custom_slot(internal_slot, "William")]
>>> res = parsing_result(text, intent_result, slots)
>>> import json
>>> print(json.dumps(res, indent=4, sort_keys=True))
{
    "input": "Hello Bill!",
    "intent": {
        "intentName": "Greeting",
        "probability": 0.95
    },
    "slots": [
        {
            "entity": "name",
            "range": {
                "end": 10,
                "start": 6
            },
            "rawValue": "Bill",
            "slotName": "greetee",
            "value": {
                "kind": "Custom",
                "value": "William"
            }
        }
    ]
}
is_empty(result)

Check if a result is empty

Example

>>> res = empty_result("foo bar")
>>> is_empty(res)
True
empty_result(input)

Creates an empty parsing result of the same format as the one of parsing_result()

An empty is typically returned by a SnipsNLUEngine or IntentParser when no intent nor slots were found.

Example

>>> empty_result("foo bar")
{
    "input": "foo bar",
    "intent": None,
    "slots": None
}