API reference¶
This part of the documentation covers the most important interfaces of the Snips NLU package.
Resources¶
NLU engine¶
-
class
SnipsNLUEngine
(config=None)¶ Main class to use for intent parsing
A
SnipsNLUEngine
relies on a list ofIntentParser
object to parse intents, by calling them successively using the first positive output.With the default parameters, it will use the two following intent parsers in this order:
The logic behind is to first use a conservative parser which has a very good precision while its recall is modest, so simple patterns will be caught, and then fallback on a second parser which is machine-learning based and will be able to parse unseen utterances while ensuring a good precision and recall.
The NLU engine can be configured by passing a
NLUEngineConfig
-
config_type
¶ alias of
snips_nlu.pipeline.configs.nlu_engine.NLUEngineConfig
-
intent_parsers
= None¶ list of
IntentParser
-
fitted
¶ Whether or not the nlu engine has already been fitted
-
to_dict
()¶ Returns a json-serializable dict
-
classmethod
from_dict
(unit_dict)¶ Creates a
SnipsNLUEngine
instance from a dictThe dict must have been generated with
to_dict()
Raises: ValueError
– When there is a mismatch with the model version
-
Intent Parser¶
-
class
IntentParser
(config)¶ Abstraction which performs intent parsing
A custom intent parser must inherit this class to be used in a
SnipsNLUEngine
-
fit
(dataset, force_retrain)¶ Fit the intent parser with a valid Snips dataset
Parameters:
-
fitted
¶ Whether or not the intent parser has already been trained
-
parse
(text, intents)¶ Performs intent parsing on the provide text
Parameters: Returns: The most likely intent along with the extracted slots. See
parsing_result()
for the output format.Return type:
-
-
class
DeterministicIntentParser
(config=None)¶ Intent parser using pattern matching in a deterministic manner
This intent parser is very strict by nature, and tends to have a very good precision but a low recall. For this reason, it is interesting to use it first before potentially falling back to another parser.
The deterministic intent parser can be configured by passing a
DeterministicIntentParserConfig
-
config_type
¶ alias of
snips_nlu.pipeline.configs.intent_parser.DeterministicIntentParserConfig
-
patterns
¶ Dictionary of patterns per intent
-
fitted
¶ Whether or not the intent parser has already been trained
-
to_dict
()¶ Returns a json-serializable dict
-
classmethod
from_dict
(unit_dict)¶ Creates a
DeterministicIntentParser
instance from a dictThe dict must have been generated with
to_dict()
-
-
class
ProbabilisticIntentParser
(config=None)¶ Intent parser which consists in two steps: intent classification then slot filling
The probabilistic intent parser can be configured by passing a
ProbabilisticIntentParserConfig
-
config_type
¶ alias of
snips_nlu.pipeline.configs.intent_parser.ProbabilisticIntentParserConfig
-
fitted
¶ Whether or not the intent parser has already been fitted
-
to_dict
()¶ Returns a json-serializable dict
-
classmethod
from_dict
(unit_dict)¶ Creates a
ProbabilisticIntentParser
instance from a dictThe dict must have been generated with
to_dict()
-
Intent Classifier¶
-
class
IntentClassifier
(config)¶ Abstraction which performs intent classification
A custom intent classifier must inherit this class to be used in a
ProbabilisticIntentParser
-
fit
(dataset)¶ Fit the intent classifier with a valid Snips dataset
-
get_intent
(text, intents_filter)¶ Performs intent classification on the provided text
Parameters: Returns: The most likely intent along with its probability or None if no intent was found. See
intent_classification_result()
for the output format.Return type: dict or None
-
-
class
LogRegIntentClassifier
(config=None)¶ Intent classifier which uses a Logistic Regression underneath
The LogReg intent classifier can be configured by passing a
LogRegIntentClassifierConfig
-
config_type
¶ alias of
snips_nlu.pipeline.configs.intent_classifier.LogRegIntentClassifierConfig
-
fitted
¶ Whether or not the intent classifier has already been fitted
-
get_intent
(text, intents_filter=None)¶ Performs intent classification on the provided text
Parameters: Returns: The most likely intent along with its probability or None if no intent was found
Return type: dict or None
Raises: NotTrained
– When the intent classifier is not fitted
-
to_dict
()¶ Returns a json-serializable dict
-
classmethod
from_dict
(unit_dict)¶ Creates a
LogRegIntentClassifier
instance from a dictThe dict must have been generated with
to_dict()
-
Slot Filler¶
-
class
SlotFiller
(config)¶ Abstraction which performs slot filling
A custom slot filler must inherit this class to be used in a
ProbabilisticIntentParser
-
fit
(dataset, intent)¶ Fit the slot filler with a valid Snips dataset
-
get_slots
(text)¶ Performs slot extraction (slot filling) on the provided text
Returns: - The list of extracted slots. See
unresolved_slot()
for the output format of a slot
Return type: list of dict
-
-
class
CRFSlotFiller
(config=None)¶ Slot filler which uses Linear-Chain Conditional Random Fields underneath
Check https://en.wikipedia.org/wiki/Conditional_random_field to learn more about CRFs
The CRF slot filler can be configured by passing a
CRFSlotFillerConfig
-
config_type
¶ alias of
snips_nlu.pipeline.configs.slot_filler.CRFSlotFillerConfig
-
labels
¶ List of CRF labels
These labels differ from the slot names as they contain an additional prefix which depends on the
TaggingScheme
that is used (BIO by default).
-
fitted
¶ Whether or not the slot filler has already been fitted
-
get_slots
(text)¶ Extracts slots from the provided text
Returns: The list of extracted slots Return type: list of dict Raises: NotTrained
– When the slot filler is not fitted
-
compute_features
(tokens, drop_out=False)¶ Compute features on the provided tokens
The drop_out parameters allows to activate drop out on features that have a positive drop out ratio. This should only be used during training.
-
get_sequence_probability
(tokens, labels)¶ Gives the joint probability of a sequence of tokens and CRF labels
Parameters: - tokens (list of
Token
) – list of tokens - labels (list of str) – CRF labels with their tagging scheme prefix (“B-color”, “I-color”, “O”, etc)
Note
The absolute value returned here is generally not very useful, however it can be used to compare a sequence of labels relatively to another one.
- tokens (list of
-
log_weights
()¶ Return a logs for both the label-to-label and label-to-features weights
-
to_dict
()¶ Returns a json-serializable dict
-
classmethod
from_dict
(unit_dict)¶ Creates a
CRFSlotFiller
instance from a dictThe dict must have been generated with
to_dict()
-
Feature¶
-
class
Feature
(base_name, func, offset=0, drop_out=0)¶ CRF Feature which is used by
CRFSlotFiller
-
base_name
¶ str – Feature name (e.g. ‘is_digit’, ‘is_first’ etc)
-
func
¶ function – The actual feature function for example:
- def is_first(tokens, token_index):
- return “1” if token_index == 0 else None
-
offset
¶ int, optional – Token offset to consider when computing the feature (e.g -1 for computing the feature on the previous word)
-
drop_out
¶ float, optional – Drop out to use when computing the feature during training
Note
The easiest way to add additional features to the existing ones is to create a
CRFFeatureFactory
-
Feature Factories¶
-
class
CRFFeatureFactory
(factory_config)¶ Abstraction to implement to build CRF features
A
CRFFeatureFactory
is initialized with a dict which describes the feature, it must contains the three following keys:- ‘factory_name’
- ‘args’: the parameters of the feature, if any
- ‘offsets’: the offsets to consider when using the feature in the CRF. An empty list corresponds to no feature.
In addition, a ‘drop_out’ to use during train time can be specified.
-
fit
(dataset, intent)¶ Fit the factory, if needed, with the provided dataset and intent
-
class
SingleFeatureFactory
(factory_config)¶ A CRF feature factory which produces only one feature
-
class
IsDigitFactory
(factory_config)¶ Feature: is the considered token a digit?
-
class
IsFirstFactory
(factory_config)¶ Feature: is the considered token the first in the input?
-
class
IsLastFactory
(factory_config)¶ Feature: is the considered token the last in the input?
-
class
PrefixFactory
(factory_config)¶ Feature: a prefix of the considered token
This feature has one parameter, prefix_size, which specifies the size of the prefix
-
class
SuffixFactory
(factory_config)¶ Feature: a suffix of the considered token
This feature has one parameter, suffix_size, which specifies the size of the suffix
-
class
LengthFactory
(factory_config)¶ Feature: the length (characters) of the considered token
-
class
NgramFactory
(factory_config)¶ Feature: the n-gram consisting of the considered token and potentially the following ones
This feature has several parameters:
- ‘n’ (int): Corresponds to the size of the n-gram. n=1 corresponds to a unigram, n=2 is a bigram etc
- ‘use_stemming’ (bool): Whether or not to stem the n-gram
- ‘common_words_gazetteer_name’ (str, optional): If defined, use a gazetteer of common words and replace out-of-corpus ngram with the alias ‘rare_word’
-
class
ShapeNgramFactory
(factory_config)¶ Feature: the shape of the n-gram consisting of the considered token and potentially the following ones
This feature has one parameters, n, which corresponds to the size of the n-gram.
Possible types of shape are:
- xxx: lowercased
- Xxx: Capitalized
- XXX: UPPERCASED
- xX: anything else
-
class
WordClusterFactory
(factory_config)¶ Feature: The cluster which the considered token belongs to, if any
This feature has several parameters:
- ‘cluster_name’ (str): the name of the word cluster to use
- ‘use_stemming’ (bool): whether or not to stem the token before looking for its cluster
Typical words clusters are the Brown Clusters in which words are clustered into a binary tree resulting in clusters of the form ‘100111001’ See https://en.wikipedia.org/wiki/Brown_clustering
-
class
EntityMatchFactory
(factory_config)¶ Features: does the considered token belongs to the values of one of the entities in the training dataset
This factory builds as many features as there are entities in the dataset, one per entity.
It has the following parameters:
- ‘use_stemming’ (bool): whether or not to stem the token before looking for it among the (stemmed) entity values
- ‘tagging_scheme_code’ (int): Represents a
TaggingScheme
. This allows to give more information about the match.
-
class
BuiltinEntityMatchFactory
(factory_config)¶ Features: is the considered token part of a builtin entity such as a date, a temperature etc
This factory builds as many features as there are builtin entities available in the considered language.
It has one parameter, tagging_scheme_code, which represents a
TaggingScheme
. This allows to give more information about the match.
-
get_feature_factory
(factory_config)¶ Retrieve the
CRFFeatureFactory
corresponding the provided config
Configurations¶
-
class
NLUEngineConfig
(intent_parsers_configs=None)¶ Configuration of a
SnipsNLUEngine
objectParameters: intent_parsers_configs (list) – List of intent parser configs ( ProcessingUnitConfig
). The order in the list determines the order in which each parser will be called by the nlu engine.
-
class
DeterministicIntentParserConfig
(max_queries=50, max_entities=200)¶ Configuration of a
DeterministicIntentParser
Parameters: This allows to deactivate the usage of regular expression when they are too big to avoid explosion in time and memory
Note
In the future, a FST will be used insted of regexps, removing the need for all this
-
class
ProbabilisticIntentParserConfig
(intent_classifier_config=None, slot_filler_config=None)¶ Configuration of a
ProbabilisticIntentParser
objectParameters: - intent_classifier_config (
ProcessingUnitConfig
) – The configuration of the underlying intent classifier, by default it uses aLogRegIntentClassifierConfig
- slot_filler_config (
ProcessingUnitConfig
) – The configuration that will be used for the underlying slot fillers, by default it uses aCRFSlotFillerConfig
- intent_classifier_config (
-
class
LogRegIntentClassifierConfig
(data_augmentation_config=None, featurizer_config=None, random_seed=None)¶ Configuration of a
LogRegIntentClassifier
Parameters: - data_augmentation_config (
IntentClassifierDataAugmentationConfig
) – Defines the strategy of the underlying data augmentation - featurizer_config (
FeaturizerConfig
) – Configuration of theFeaturizer
used underneath - random_seed (int, optional) – Allows to fix the seed ot have reproducible trainings
- data_augmentation_config (
-
class
CRFSlotFillerConfig
(feature_factory_configs=None, tagging_scheme=None, crf_args=None, data_augmentation_config=None, random_seed=None)¶ Configuration of a
CRFSlotFiller
Parameters: - feature_factory_configs (list, optional) – List of configurations that
specify the list of
CRFFeatureFactory
to use with the CRF - tagging_scheme (
TaggingScheme
, optional) – Tagging scheme to use to enrich CRF labels (default=BIO) - crf_args (dict, optional) – Allow to overwrite the parameters of the CRF
defined in sklearn_crfsuite, see
sklearn_crfsuite.CRF
(default={“c1”: .1, “c2”: .1, “algorithm”: “lbfgs”}) - data_augmentation_config (dict or
SlotFillerDataAugmentationConfig
, optional) – Specify how to augment data before training the CRF, see the corresponding config object for more details. - random_seed (int, optional) – Specify to make the CRF training deterministic and reproducible (default=None)
- feature_factory_configs (list, optional) – List of configurations that
specify the list of
Result and output format¶
-
intent_classification_result
(intent_name, probability)¶ Creates an intent classification result to be returned by
IntentClassifier.get_intent()
Example
>>> intent_classification_result("GetWeather", 0.93) { "intentName": "GetWeather", "probability": 0.93 }
-
unresolved_slot
(match_range, value, entity, slot_name)¶ Creates an internal slot yet to be resolved
Example
>>> unresolved_slot([0, 8], "tomorrow", "snips/datetime", "startDate") { "value": "tomorrow", "range": { "start": 0, "end": 8 }, "entity": "snips/datetime", "slotName": "startDate" }
-
custom_slot
(internal_slot, resolved_value=None)¶ Creates a custom slot with resolved_value being the reference value of the slot
Example
>>> s = unresolved_slot([10, 19], "earl grey", "beverage", "beverage") >>> custom_slot(s, "tea") { "rawValue": "earl grey", "value": { "kind": "Custom", "value": "tea" }, "range": { "start": 10, "end": 19 }, "entity": "beverage", "slotName": "beverage" }
-
builtin_slot
(internal_slot, resolved_value)¶ Creates a builtin slot with resolved_value being the resolved value of the slot
Example
>>> rng = [10, 32] >>> raw_value = "twenty degrees celsius" >>> entity = "snips/temperature" >>> slot_name = "beverageTemperature" >>> s = unresolved_slot(rng, raw_value, entity, slot_name) >>> resolved = { ... "kind": "Temperature", ... "value": 20, ... "unit": "celsius" ... } >>> builtin_slot(s, resolved) { "rawValue": "earl grey", "value": { "kind": "Temperature", "value": 20, "unit": "celsius" }, "range": { "start": 10, "end": 19 }, "entity": "beverage", "slotName": "beverage" }
-
resolved_slot
(match_range, raw_value, resolved_value, entity, slot_name)¶ Creates a resolved slot
Parameters: Returns: The resolved slot
Return type: Example
>>> resolved_value = { ... "kind": "Temperature", ... "value": 20, ... "unit": "celsius" ... } >>> resolved_slot({"start": 10, "end": 19}, "earl grey", ... resolved_value, "beverage", "beverage") { "rawValue": "earl grey", "value": { "kind": "Temperature", "value": 20, "unit": "celsius" }, "range": { "start": 10, "end": 19 }, "entity": "beverage", "slotName": "beverage" }
-
parsing_result
(input, intent, slots)¶ Create the final output of
SnipsNLUEngine.parse()
orIntentParser.parse()
Example
>>> text = "Hello Bill!" >>> intent_result = intent_classification_result("Greeting", 0.95) >>> internal_slot = unresolved_slot([6, 10], "John", "name", ... "greetee") >>> slots = [custom_slot(internal_slot, "William")] >>> parsing_result(text, intent_result, slots) { "input": "Hello Bill!", "intent": { "intentName": "Greeting", "probability": 0.95 }, "slots: [{ "rawValue": "Bill", "value": { "kind": "Custom", "value": "William", }, "range": { "start": 6, "end": 10 }, "entity": "name", "slotName": "greetee" }] }
-
is_empty
(result)¶ Check if a result is empty
Example
>>> res = empty_result("foo bar") >>> is_empty(res) True
-
empty_result
(input)¶ Creates an empty parsing result of the same format as the one of
parsing_result()
An empty is typically returned by a
SnipsNLUEngine
orIntentParser
when no intent nor slots were found.Example
>>> empty_result("foo bar") { "input": "foo bar", "intent": None, "slots": None }