AI function
Traditional RPA realizes Process automation based on fixed rules. In the actual business scenario of enterprises, there are a large number of business Process that are not based on fixed rules, which need people's cognition and judgment. For example, contract is one of the common document in enterprise business scenarios. In business Process, it is often necessary to read electronic contracts, recognize a large number of words, and extract key information such as the name of Party A and Party B, contract date, etc. For these scenarios, we can use artificial intelligence (AI, which will be uniformly used below) technology to make robot realize "cognition" of contract and other information, which we call "cognitive automation". Therefore, RPA and AI combine Process automation with cognitive automation to automate more complex and high-value business scenarios in enterprises.
For software robot, if AI is its brain, cognitive ability is its eyes, mouth and ears, and RPA is its hands. Combined with AI Capability, RPA can only help automate rule-based, mechanical and repetitive Task, It has expanded to richer business scenarios, effectively connecting the physical world and the digital world, and meeting the more flexible and diverse automation needs in actual business. Enterprises can also apply AI technology to business quickly, economically and flexibly by adopting RPA platform with rich AI Capability.
Laiye Automation Platform integrates a large number of AI Capability, and will continue to enhance these capabilities in the future. This chapter will introduce you how to use AI Capability in Laiye Automation Platform.
UiBot Mage
Laiye Automation Platform mage (hereinafter referred to as mage) is a AI Capability platform specially built for Laiye Automation Platform, which can provide various AI Capability required for the implementation of Process automation.
Mage means "a person with magic or a lot of knowledge after a long time of learning" in Chinese, so it names the product and takes the Chinese name "magician". It can be seen that Laiye Automation Platform hopes to use the knowledge learned to empower rparobot.
Starting from version 5.3.0 of Laiye Automation Platform creator, when writing RPA Process with Laiye Automation Platform, for the community version of Laiye Automation Platform creator, on the premise of networking, users can use various AI Capability provided by Laiye Automation Platform mage after logging in, so as to transform the unstructured information in pictures and document into structured data; For enterprise customers, when using the enterprise version of Laiye Automation Platform creator, they can enjoy the AI Capability of Laiye Automation Platform mage even if they are not connected to the Internet, because the enterprise version can provide the privatization deployment of Laiye Automation Platform mage, and provide standardized call interfaces to flexibly adapt to business needs.
Uibot mage's product features include:
- Built in OCR, NLP and other AI Capability suitable for rparobot.
- Provide pre trained models, which do not need AI experience and are ready to use out of the box.
- In addition to pre training, it also provides customized models. With only a small amount of configuration or training, AI can have strong generalization ability.
- Seamlessly linked with Laiye Automation Platform creator, it is convenient to use AI Model in the form of low code in Process.
- It can identify various types of document, which are suitable for different business scenarios such as financial reimbursement, contract processing, bank account opening, etc.
At present, AI functions included in uibot mage are as follows:
It can be seen that AI functions in uibot mage are very rich and are still in the process of continuous Extension. Although there are many functions, they can be roughly divided into two categories: one is called "general AI Capability", which refers to the AI function that you basically don't need to make too many Settings in uibot mage and use it out of the box. Among them, the commonly used ones are various identification of pictures, such as standardized bills (invoices, taxi tickets, etc.), standardized cards (ID cards, business licenses, etc.). The other is called "customized AI Capability", which means that you need to spend some time before using these AI Capability. You can do some configuration or training in uibot mage first. It is a little troublesome to use, but it can handle a wider range of data.
Considering the emphasis of the content, this chapter only introduces the image recognition function in "general AI Capability" for the time being. Other functions will be described in the following chapters, which can also be referred to Online help for uibot mage 。
Mage AI command in Laiye Automation Platform Creator
In Laiye Automation Platform creator, many AI functions of Laiye Automation Platform mage have been packaged into corresponding commands, which are placed in a Category called "mage AI". It also includes "Information Extraction", "general Card & Certificate Recognition" and other secondary Category. After each secondary Category is expanded, there are many commands below.
For example, when we expand the two-level Category of "general Card & Certificate Recognition", we can see that there are also five commands below, "screen Card & Certificate Recognition", "image Card & Certificate Recognition" and "obtain card type". What are the differences between these five commands and where are they used?
In fact, whether it is "general Card & Certificate Recognition", "Text Recognition" or "general bill recognition", the main use process of MAGE AI commands of "recognition" is divided into two steps:
- From the specified image, distinguish Show all results
- From the results of identification, obtain Send out the required information
The commands indicated by the red box in the above figure are actually doing step 1, because the last two words of these commands are "recognition"; The commands in the blue box above are actually doing step 2, because the first two words of them are "get". Therefore, if we want to use mage AI to identify an ID card, we need to first select a command from the red box according to the different data sources, and then select a command from the blue box according to the specific information to be obtained.
The three commands in the red box are mainly different in image sources, including:
- Screen recognition: the image comes from a window or an area of the window on the screen.
- Image recognition: the image comes from an image file on the local hard disk.
- Pdf recognition: the image comes from a PDF file on the local hard disk. The recognition range can also be divided by recognizing all All or specifying the page range.
Except for the different data sources, the three commands have no other differences. For the same image, the recognition results are the same. But their recognition results are usually difficult to use directly, so we need to use the two commands in the blue box to further obtain the key information. Of which:
- Obtain card type: because "general Card & Certificate Recognition" can identify many kinds of cards, including ID card, driver's license, real estate license, business license, etc. This command can automatically determine which card the recognized image belongs to.
- Get card content: for a specific card, this command can extract each field. For example, the fields of name, gender, nationality and ID card number can be extracted from the ID card.
Let's actually test it. First, find an image file of your ID card. In this article, in order not to disclose personal privacy, we found the following fictitious ID card image on the Internet for reference only:
This ID card image is obviously made up by netizens. The shape and format are not standardized, and there is a certain angle in the photo. However, uibot mage can still accurately obtain the information.
Suppose the ID card image is saved in C:\temp\id_card.jpg
In the document. Create a new Process and drag a "image Card & Certificate Recognition" command. This command has two attribute that we must fill in, as shown in the following figure:
For the identify picture attribute, select C:\temp\id_card.jpg
Documents are sufficient; For the "mage configuration" attribute, do not fill it in manually. Click the button on the right of the attribute (in the red box above), and the dialog box shown below pops up.
For the community version of Laiye Automation Platform, the only thing that needs to be Settings in this dialog box is the "recognizer name" drop-down box. If there is no content in the drop-down box, click "go to configuration" on the right (in the red box above), the browser will automatically open and jump to the home page of uibot mage. You can create a new general Card & Certificate Recognition model in uibot mage, and select the back-end AI engine. The engine marked "native" is the AI Capability developed by laiye technology, and the engine marked "third party" is the AI Capability exclusively provided by other AI manufacturers for laiye technology. Different AI engines have their own advantages in different scenes. You can choose the AI engine with better effect according to the actual situation.
After building the recognition model on Laiye Automation Platform mage, you can select this model in Laiye Automation Platform creator, and the previously vacant "mage configuration" attribute will also be automatically filled in, so we don't need to care about what it fills in.
Next, we drag in the "get card type" and "get card content" commands in turn to obtain the card type, name, date of birth, address and other information in the recognition result. Note: these commands all have a common attribute called "Card & Certificate Recognition result". Just fill in the output of the "image Card & Certificate Recognition" command here. Each time you get a piece of information, you can drag a command of "output debugging information" to display it.
According to the above steps, the final Process is roughly as follows:
After running, the following results can be obtained:
It is worth noting that the commands of MAGE AI class need to be connected to the server of uibot mage when running. If you are using the community version of Laiye Automation Platform, we have built a server on the Internet and you can use it directly. For the enterprise version of Laiye Automation Platform, you can use the server on the Internet or deploy and use your own Laiye Automation Platform mage server. If we use servers on the Internet, we limit the number of free uses per month, and each time we run mage AI commands of "recognition" class, it will be automatically deducted once. The amount of automatic replenishment times at the beginning of each month. If the number of free times is not enough, you can also pay for more usage times.
When we want to extract more fields, the process of editing the command and attribute of "get card content" one by one is more cumbersome. At this time, we can use mage AI recognition wizard, which will be more convenient and fast.
Mage AI recognition Wizard
The above describes how to use the mage AI function in Laiye Automation Platform creator to identify the information in an ID card. It can be seen that when there are many fields to obtain, the operation is slightly cumbersome. Mage AI can also identify invoice and other bill information. If you follow the above methods, it will be more cumbersome. Because when uibot mage is doing Card & Certificate Recognition, it only recognizes one card (such as an ID card) at a time. When doing bill identification, many financial departments require that multiple different bills be pasted on the same paper for reimbursement, such as value-added tax invoices, train tickets, taxi tickets, etc. Uibot mage will also identify all these bills at one time. In this way, when writing the Process, it will be a little troublesome, because after identification, it is necessary to judge how many bills have been identified in total, and what type of bills are each. Different types of bills will have different fields, such as the departure and destination on the train ticket, but the VAT invoice does not have these information. For Laiye Automation Platform veterans, writing such a Process is not too difficult, but novices may be a little difficult.
For this reason, the wizard function of MAGE AI is also provided in Laiye Automation Platform creator, which can quickly guide you to configure the image recognizer, Settings the extraction type and field, and automatically generate the relevant command framework through the graphical interface. The process is smooth and easy to use.
Open Laiye Automation Platform creator. When writing any Process block, you can find the icon marked "mage AI" on the toolbar, as shown in the red box below.
Click this button to pop up the mage AI recognition wizard window. As you can see, this wizard includes three steps: configure recognizer, select image source, and propose type and field.
Using this wizard, we can automatically generate a series of commands, which greatly simplifies our operations. For example, in the following image, there are both invoices and taxi tickets. We hope to extract the key fields in these tickets at one time. Even if the direction of the invoice is inverted, it doesn't matter. Uibot mage will automatically recognize it.
Just follow the three steps of MAGE AI recognition wizard and fill in the relevant information one by one. As follows:
Step 1: open the mage AI recognition wizard. First, configure the recognizer, that is, select the AI module, select the AI Capability and its recognizer. This is basically similar to the operation of configuring mage recognizer in the previous article, just select the required function and recognizer in turn.
Step 2: select the image source. You can use the mouse to select / intercept a recognition area from the computer screen in the way of "select target"; Or "select image", select or drag a local image file; Or "select PDF", select a PDF file and specify the page range.
Let's take selecting a local image as an example, and directly select the path of the image file on the local hard disk. After selection, the image will be automatically displayed in the dialog box for preview.
Step 3: select the extraction type and extraction field. Multiple bills and their extraction fields can be configured at the same time. After configuration, you can confirm your choice in the "selected information" on the right.
After the above three steps are completed, click "finish", and Laiye Automation Platform Creator will automatically generate a series of commands, as shown in the following figure.
As you can see, for the above scenario, Laiye Automation Platform creator automatically generates 17 lines of commands, saving our workload. However, these commands are still just a framework, and you need to continue to fill in other commands to meet business requirements. For example, you may need to fill in the information of each identified VAT invoice in an excel file and each identified taxi ticket in another excel file. The effect after recognition is shown in the following figure:
Then, you may need to insert commands to open and close Excel files in the appropriate position in the Process, and insert commands to write Excel files in the appropriate position. Based on the automatically generated framework, there should be no difficulty for you to insert these commands. This chapter will not list the example Process, please practice by yourself. Through practice, it is not difficult to find that the above Process of identifying invoices and automatically entering different types of invoices into Excel files can be developed in less than 10 minutes. In the development process, most of the time, you only need to click with the mouse, and there are few scenes of even typing the keyboard. This shows the convenience of the "mage AI recognition Wizard" in Laiye Automation Platform creator.
Local OCR
The full name of OCR is "optical character recognition", which is a technology with a long history. As early as last century, OCR can scan and obtain the text content from paper books. Nowadays, OCR technology is also evolving, and has been integrated into popular technologies such as deep learning, and the recognition rate is constantly improving. We now use OCR to recognize the words on the screen. Because these words do not have the same problems as paper books, such as blurred printing and poor light, the recognition rate is very high.
In fact, the uibot mage mentioned above includes the function of OCR. However, some field merges are not suitable for using uibot mage, such as the following scenarios:
We mentioned in the previous content that in some cases, UI elements cannot be obtained. At this time, the exact operation position can be found by using the "image" command. But you can't read out the contents of UI elements like targeted commands.
For example, the famous game platform Steam uses DirectUI technology to draw its interface, and we can't get any text in it (although these contents are easy to see with the naked eye), as shown in the figure.
Using uibot mage, of course, you can get the text, but it is "anti-aircraft gunfire against mosquitoes". Moreover, the AI Capability of uibot mage must be connected to the Internet to use, and the free version also has quota restrictions. At this point, you need to sacrifice the "local OCR" command of Laiye Automation Platform.
"Local OCR" specifically includes the following OCR commands:
As the name suggests, these commands do not need to be connected to the Internet and can be executed directly on the computer on which you are running Laiye Automation Platform.
Let's try the command "screen OCR recognition" first. Double click or drag to insert a "screen OCR" command, and click the "Selector" button on the command (at this time, the window of Laiye Automation Platform Creator will be temporarily hidden); Move the mouse over the login window of Steam, which will be covered by a mask with a red box and a blue background; At this time, drag the mouse to draw an area for character recognition, which will be represented by a purple box. As shown in the figure.
Of course, you can also click the left mouse button directly on the window without marking out the area to be recognized, which represents the recognition of the whole window. Such a command will automatically find the login window of Steam when running, and take a screenshot in the specified area (relative to the position of the window), then recognize the text in the screenshot, and finally output the recognized text to the specified variable.
According to the experience of previous literature, you can directly click the triangle on the right side of the command to run a single command, and the results will be automatically output after the operation is completed. It can be seen that as long as the login window of Steam exists and the window size does not change, the text "account name" in the area we delimit can be recognized.
"Screen OCR recognition" was demonstrated above. In addition, there is the command "image OCR recognition", which is similar to "screen OCR recognition", but the former needs to provide an image file. When the Process runs to this command, Laiye Automation Platform will directly use the specified image file for recognition without considering the content of the screen image.
In addition, there are "mouse click OCR text", "mouse move to OCR text", "find OCR text position" and other commands, which are similar to "click image", "mouse move to image" and "find image" commands in the "image" commands, but there is no need to pass in the image, just mark the text to be found in the attribute. When Laiye Automation Platform is running, it will automatically find the specified text on the screen and click or move the mouse according to the position of the text.
Baidu OCR
Many cloud manufacturers in the market provide online OCR services. Among them, baidu OCR has a relatively good reputation, and many users have purchased Baidu OCR services by themselves. In this case, Laiye Automation Platform also provides the relevant commands of Baidu OCR, which can easily and quickly use the functions of Baidu OCR. Moreover, baidu OCR has also optimized the images of invoices, ID cards, train tickets and other bills and cards, which can identify the key contents more accurately. Its recognition effect is different from that of uibot mage. Users can choose according to the measured effect and actual situation.
In order to normally access the OCR of Baidu cloud, the following three requirements need to be met first:
Be able to access the Internet. Baidu cloud is an Internet-based cloud service, not a locally running software. For personal use, you must access the Internet. If it is for enterprise use and cannot access the Internet, you may need to negotiate with Baidu cloud to purchase its offline services.
You may need to pay Baidu. Baidu OCR service is charged, but it provides a free quota of several times a day (5000 times a day for Text Recognition, 500 times a day for license recognition, etc.). For personal use, the free quota is basically enough. Of course, baidu may change its free quota and charging price policies at any time. We can't predict how much you need to pay Baidu.
Because Baidu cloud charges, it is impossible for Laiye Automation Platform users to share one account. Therefore, each user should apply for his own Baidu cloud account and the account of Baidu OCR service (generally referred to as access key and secret key). The application method is very simple, please click Check out our online tutorial here 。
Laiye Automation Platform contains the following Baidu OCR commands:
It can be seen that compared with the "local OCR" command described above, baidu OCR command also has five commands, "mouse click OCR text", "mouse move to OCR text", "find OCR text location", "image OCR recognition" and "screen OCR". The use method of these five commands is basically similar to Laiye Automation Platform's "local OCR" command, the only difference is, You need to fill in the "attribute" with the access key and secret key we applied for on Baidu cloud.
Take another look at the "image special OCR recognition" command. The so-called "special" means that what we want to test is a specific image, such as ID card, train ticket, etc. Suppose in D:\1.png
The following images are saved in the file:
Insert a "image special OCR recognition" command and modify its attribute as shown in the figure. In addition to the access key and secret key mentioned above, you also need to specify the file name of the image to be recognized and select the OCR engine as "train ticket recognition". Other attribute keep their default values. After running, you can see the recognition results in the output column. This result is actually a JSON document. If it needs further processing, you need to use the JSON class commands provided by Laiye Automation Platform, but it has little to do with this chapter and will not be skipped.