Usage of Regex Entity
The use of regex allows for flexible and reliable matching to synonyms with known entity values and the extraction of key information.
Scenario 1: extract key information from user message
User Story
User talking to a virtual assistant working for a library, hoping to find a specific kind of book.
User message: "find a book about xxx"
Idealy, the agent can extract "xxx" from this message and take it as the search term.
Build the dialogue flow
- Create a regex entity called "search for books", set the regex as find a book about (\w+)and the regex tag as "search criteria". 
- Create a slot called "search criteria", set the rule as "Fill with Role", enter the role name 1and choose the entity "search for books". 
- Create an intent called "search for books", add the utterance [find a book about history]{"entity":"search for books"}. 
- Create a dialogue flow skill- Set the intent as "serach for books", set the status as "ON".
- Enter the skill canvas, add a request block and set the slot to fill as "search criteria".
- Then add an inform block, set the response as "Seartch criteria: {slot=serach criteria}".  
 
- Make sure the skill status is "ON", train the agent.
- Test the skill, you can see the agent has successfully extracted the criteria and fill it into the "search criteria" slot.
 
Commonly used regex syntax
We set the regex as find a book about (\w+) in last section, where
- the \windicates ANY ONE word/non-word character.
- the +indicates one or more occurrences (1+) of the previous sub-expression
Thus:
- \w+indicates 1 or more word/non-word characters
Here are other commomly used regex syntax for your reference::
- \d*indicates 1 or more digits
- ?indicates zero or one (optional)
- *indicates zero or more (0+)
- (ab)+:indicates zero or more- ab, e.g., ab,abab,ababab...
Fill with Role & Regex entity
- Content inside the (), in this case, - \w+, will be extracted as the entity value, and given the role name.- Role names default to 1, 2, 3 and so on in order from left to right in the ().
- You can customize the role name with (?P<name>xxx). For example,find a book about (?P<book>\w+). Correspondingly, the role name needs to be set asbookwhen setting the filling rules to the slot.
 
- If you do not want the content to be extracted, you need to change the - (xxx)into- (?:xxx), in this case, change- (\w+)to- (?:\w+), then neither the entity nor the role will be extracted from content inside the ().
Optimize the regex
- Beside of - find a book about history, users may also send messages like:- I want books about history
- books about history
- show me a book about history
- find some books about history
 
- Users may not mention the word "find", and they may also use "books" instead of "book" in some cases. Thus, the regex can be set as - (?:find a)? book(?:s)? about (\w+)and the following utterances will be recognized as well.- find a book about history
- I want books about history
- books about history
 
- Instead of using the word "find", users may also use "show" to express similar meaning. Thus, the regex can be set as - (?:find a|show a)? book(?:s)? about (\w+)and the following utterances will be recognized as well.- find a book about history
- show me a book about history
- find some books about history
 
Scenario 2: fill multiple slots
User story
In an ASR project, when the user says the IP address "10.12.2.245", the ASR recognized as "10.12 dot 2.245".
Thus, there is a need to correct the results.
Build the dialogue flow
- Create a regex entity called "IP address", set the regex as (\d+)\.(\d+) dot (\d+)\.(\d+). 
- Create 4 slots, set the role name and entity as following:
| Slot Name | Role Name | Entity | 
|---|---|---|
| IP1 | 1 | IP address | 
| IP2 | 2 | IP address | 
| IP3 | 3 | IP address | 
| IP4 | 4 | IP address | 

- Create a dialogue flow skill- Create an intent called "extract IP address", set the response as [10.12 dot 2.245]{"entity":"IP address"}. 
- Create a dialogue flow skill and set the intent as "extract IP address".
- Enter the skill canvas, add four request blocks, set the slot to fill as "IP1", "IP2", "IP3", "IP4" in order.
- Add an inform block, set the response as {slot=IP1}.{slot=IP2}.{slot=IP3}.{slot=IP4}. 
 
- Create an intent called "extract IP address", set the response as 
- Set the skill status as "ON", train the agent.
- Click "Test", the agent has successfully extracted the IP address and sent it in a correct format.
 
Optimize the regex
Take as many cases as possible into account when setting regex can optimize the extract effect.
Shown below are some bad cases in this case:
- 1111.1.2.3
- 333.1.2.3
- 256.1.2.3
- 1.2.3.2555
- 1.2.3333.4
- For IP address, each byte contains 1-3 digits, so you can use - {1,3}in the regex to set the limitation:- (\d{1,3})\.(\d{1,3}) dot (\d{1,3})\.(\d{1,3}).
- 0–255 are the possible values of one byte, so when there have 3 digits, the first one can only be 1 or 2. Therefore, the regex can be set as: - ([012]?[0-9]{1,2})\.([012]?[0-9]{1,2}) dot ([012]?[0-9]{1,2})\.([012]?[0-9]{1,2}).