Skip to main content
Version: 2.0.0

Entities

This chapter will introduce: what is entity, and how to create and manage the entities on the platform

What is Entity?

Entity is the key information in natural language. For example: name, geographical location, time, date etc.

note

Entity itself is independent of intentions and skills. It is the basis for natural language processing.

-- Whether or not trigger an intent, the agent is able to extract entities from user messages.

Entity extraction is to extract the key information from natural language texts for the agent to better understand and process natural language. It inlcudes:

  • Extracting entity values or its synonyms from texts.

    For example: New York City, NYC, The Big Apple

  • Normalising the synonyms into standard statement.

    For example:

    • User asks: "What is the weather like in NYC?"
    • The "NYC" will be normalised into "New York City".
    • The agent processes the message "What is the weather like in New York City?"

Entities are divided into 3 types on the platform: enumerate entity, regex entity and preset entity.

Enumerate entity

Enumerate entity refers to entities whose value can be enumerated and normalised based on lexical matching. Theoretically, the more synonyms added, the easier it is to extract the entity.

  • Entity Name: The name of the key information extracted from user messages. Generally speaking, it stands for a specific domain.

    For example, city, order number, brand name etc.

  • Entity Value: The standard statement of the entity. Each entity needs to provide at least one entity value.

    For example, the entity value ​​of "city" includes "Beijing", "Shanghai", "Shenzhen", etc.

  • Synonyms: Multiple statements that represent the entity value. Synonym contained in user messages will be normalized to the entity value.

    For example, synonyms of "New York City" include "NYC" and "The Big Apple". They can all be extracted and normalised into "New York City" by the agent.

Create and edit enumerate entities

  1. Click "Build - Resources - Entity" to enter the page. no-enumerate-entity

  2. Click "CREATE", enter the name and hit Enter to create a new enumerate entity. You can edit or delete the entity by hovering over it. has-enumerate-entity

  3. Click "Create Entity Value" to add the entity values and synonyms. create-value has-value

Import & Export

  1. Click the first button of "CREATE" in the left column to import enumerated entities. import-enumerate-entity
caution

When importing multiple enumerated entities:

a. The entity name and entity value are both add logic, i.e. add content that is not available on the platform and do not change content that is already on the platform.

b. Synonyms for entity values are override logic, i.e. the data on the platform is overwritten with the synonyms in the file.

  1. Click the second button of "CREATE" in the left column to export existing enumerated entities. export-enumerate-entity

Regex Entity

The extraction of some entities needs to match rules rather than values. For example, ID number, mobile phone number, order number, and licence plate can all be set as regex entities for better extraction.

  1. Click "Build - Resources - Entity", choose the "Regex Entity" tab to enter the page. no-regex-entity

  2. Click "+CREATE", set the name, regx and regex tag to create a new regex entity. create-regex-entity

  3. After configuring, you can add match texts to see whether it can be successfully extracted. test-regex-entity

Preset Entity

Preset entity refers to entites preset on the platform. They are entities that cannot be divided into enumerated entity or regex entity. Currently, there are four preset entities: date, time, city and any.

  1. Click "Build - Resources - Entity", choose the "Preset Entity" tab to enter the page. preset-entity

  2. These entities can be used in slots and intents directly without being configured. use-preset-entity

Domain Vocabulary

Domain vocabulary refers to entities that will affect the word splitting in conversations. The agent will treat domain vocabularies as a whole and no longer split the word.

  1. Select the enumerate eneity value that you want to set as a domain vocabulary, click "Edit" and switch on "Influence tokenizer as domain vocabulary" to finish configuring. set-domain

  2. Click "Build - Resources - Entity", choose the "Domain Vocabulary" tab to enter the page, You can also manage the domain vocabularies here. Once you clicke "remove", the entity will no longer work as a domain vocabulary. has-domain

Entity disambiguation

  1. When the agent has extracted multiple entity values that share overlapping content, there are three policies you can choose to deal with the conflict:

    For example, the entity values extracted are "iPhone", "iPhone 13" and "iPhone phone case".

  2. Return all (default policy): policy:0

    Return "iPhone", "iPhone 13" and "iPhone phone case".

  3. For the values belong to the same entity, return the longest one: policy:1

    Return "iPhone 13" and "iPhone phone case".

  4. No matter the values belong to the same entity or not, return the longest one: policy:2

    Return "iPhone phone case".

  5. For more details on how to configure the policy, see NLU Pipeline.