Command without Target
In the Chapter 3, we have discussed "interface elements" and how to select them as a target for a "Command with target". However, we are often unable to accurately identify the desired interface element as the target. Therefore, we need to learn "Command without target" to handle these situations.
Why is there no target?
When we search for and operate on interface elements, we are actually calling the application programming interfaces (API) provided by the underlying development frameworks of the element to locate them.Laiye RPA provides a common abstraction of all these APIs to allow users to use all interface elements the same way, without having to worry about the underlying details. However, some software programs do not have an API to identify its interface elements, and some that do choose to hide the API from external use. Some examples are:
- Virtual machine and remote desktop
This includes Citrix, VMWare, Hyper-V, VirtualBox, Remote Desktop (RDP), various Android emulators, etc.These programs are run by a separate operating system, which is completely isolated from the operating system of Laiye RPA. Naturally, Laiye RPA cannot operate interface elements in another operating system.
It is possible to install Laiye RPA and the software used in the processes in the virtual machine or the remote computer.Thus, the interfaces provided can be directly used by Laiye RPA,since they are still running in the same operating system. In this setup, the local computer only serves as a display.
- DirectUI-based software
Originally, the development frameworks of Windows software interfaces were all provided by Microsoft. Some examples include MFC, WTL, WinForm, WPF, etc.Microsoft provides interfaces for external software programs to interact with the interface elements. However, in recent years, many development teams started launching their own Windows development frameworks, collectively known as DirectUI, to make it easier to create slicker UIs. The interface elements in these frameworks are all "painted" on the screen. Although the interface elements are visible to the users, the operating system and other programs do not know where they are. While some DirectUI frameworks provide external interfaces to find interface elements, many provide no such interface, which makes it impossible for Laiye RPA to find their interface elements.
In fact, the interfaces of Laiye RPA Creator and Laiye RPA Worker are developed using a DirectUI framework called electron. While electron does provide an API for finding its interface elements, this API is disabled by default in public releases. Therefore, as you may have noticed, Laiye RPA's interface elements cannot be identified by any RPA platform, including Laiye RPA itself.
- Video games
In order to have more control over how every visual element of a video game game's appearance, game developers use frameworks to "paint" the interface elements in their games in a way similar to DirectUI frameworks. This kind of interface usually does not provide an interface to let us know the location of the interface elements.Unlike software programs based on DirectUI, interfaces in video games change rapidly and therefore have a high requirement for fast processing. In general, RPA platforms are not optimized for video games and therefore are not suitable for automating video game processes.
If you want to apply automatic operation on video games, we recommend you use Quick Macro, which is specifically designed for video games. It has many built-in interface searching methods for video games, such as pixel-level color comparisons, image searching, and much more, and it is optimized to run efficiently.
Command without Target
While we spent Chapter 3 introducing Commands "with target", Laiye RPA also provides Commands "without target". In Figure 63, Commands with targets are enclosed by red boxes, and Commands without targets are enclosed by blue boxes.
If you encounter Windows programs whose targets you cannot select, then you must resort to Commands without target. Among the ones listed in Figure 63, the most important Command without target is "Simulate Movement". By supplying a coordinate as a parameter, we can use "Simulate Movement" to move the cursor to anywhere on the screen represented by the coordinate.Then, we can use "Simulate Click" to simulate pressing the left mouse button to press a button. Or we can use it to set focus on an input box and use "Input Text" to enter any text we want.
For example, there is an input box whose center point has coordinates x:200, y:300.We can first use "Simulate Movement" with coordinates x:200, y:300 to move the cursor to that location. Then, we can use "Simulate Click" to simulate a left-click to select the textbox. Finally, we can invoke "Input Text" to input the desired text. If we use "Input Text" without the first two steps, we are likely going to enter the text somewhere else.
To understand the technical details, we first need to explain the screen coordinate system of the Windows operating system. If you are familiar with this, feel free to skip the following.
In Windows, each pixel on the screen has a unique 2D coordinate point (x and y). The x coordinate is the number of pixels you need to count from the left of the screen to reach point, starting with 0, and the y coordinate is the number of pixels you need to count from the top of the screen to the point, starting with 0. For example, the coordinate point x:200, y:300 indicates a point that is 200 pixels to the right of the leftmost pixel and 300 pixels below the topmost pixel on the screen. Therefore, the coordinate point x:200, y:300 roughly corresponds to the red circle in Figure 64.
As long as we have two integer values of x and y, we can know the position of a point on the screen.In Laiye RPA, some Commands can get the position of a point on the screen and output it to a variable. We will learn later how to use a "dictionary" to store the two x and y values describing a point, but for now, just know that whenever Laiye RPA outputs the coordinates of a point, it returns a special object known as a dictionary. Assign this object to a variable pnt , and you can use pnt["x"] and pnt["y"] to retrieve the x and y values of the point, respectively.
If the interface element we are looking for is in a fixed position on the screen, then we can supply fixed coordinates to Commands without target to simulate normal operations.However, this is rarely the case. Windows get moved around and resized all the time, and the coordinate of the interface elements within the windows change as a result as well.
Therefore, in Laiye RPA, it is generally not recommended to write fixed coordinates directly, because it is difficult to account for all the possible locations an element can be in. When using a Command without target, you should use it along with other Commands that find and store in variables the coordinates of your intended interface elements through techniques other than feature matching.
In Laiye RPA, the "best partner" of a Commandwithout target is an image Command.
Image Command
While mouse and keyboard Commands are certainly very useful, Laiye RPA also provides a powerful category of Commands: Image Commands. In the command area of Laiye RPA Creator, find the Image category, and its member Commands are shown in Figure 65.
Let's first look at the Command "Find Image", which searches the screen to find areas that match an image. You need to specify an image file (we support .bmp, .png, .jpg, and many other formats, though .png images are recommended for its lossless encoding). Then, the Command scans the specified area on the screen from left to right and top to bottom to check if the image appears anywhere in it. If it does, the Command returns the coordinates of the found image and stores it in a variable. Otherwise, the Command throws an exception.
This Command can seem very complicated, since it requires both an image file and a scan area. In fact, if we use Laiye RPA Creator, using it is a piece of cake.
For example, consider the login interface for Steam, the famous video game platform (Figure 66). The interface is written using a DirectUI framework. The account name and password fields, as well as the button "LOGIN", cannot be directly selected by any RPA platform. This is where Image Commands play in.
Assume that we have launched Steam and opened the login interface, and Steam has remembered our account name and password. All we need to do is to click the "LOGIN" button. Create a new process in Laiye RPA Creator and enter a process Block to edit it. Insert a "Find Image" Command and click the "Selector" button on it (Figure 67).
Similar to a Command with target, Laiye RPA Creator is temporarily hidden. Now, press the left mouse button and drag to the bottom right to draw a blue box to enclose the image that we want to find. Release the mouse button to confirm your selection.
It might seem like what you did was just drawing a blue box, but in fact, Laiye RPA Creator has already done two things:
It has determined which window the blue frame fell on and recorded the features of this window. When looking for this picture in the future, Laiye RPA will find the window first via feature matching and then scan for the picture within the scope of this window.
The part of the screen selected by your blue box is saved as a .png screenshot, and the file is saved in the "res" directory within the process directory. This will be the image used to scan the screen in the future.
Left-click the "Find Image" Command to select it and inspect it Properties (Figure 69).
Among them, the two circled Properties are the most important ones, and they have also been automatically filled by Laiye RPA Creator when we boxed the image earlier. Other than these two, the Property "Similarity" is a decimal number between 0 and 1 that determines how strictly Laiye RPA requires each pixel of the screen to match with each pixel on the search image in order to confirm a match. The higher the number, the stricter. The default value is 0.9, which allows for a small amount of mismatch. The Property "Activate window" determines whether to bring the selected window to the foreground before searching for the image. If the selected window is covered by other windows, even if the image is in the window, the Command could fail to find it. Therefore, the Property "Activate window" defaults to "True".
Other Properties usually do not need to be changed, and you can just use the default values. For the Property "Output to", a default variable name objPoint is chosen. If the Command successfully finds the image, it will save the result in that variable. Let's see what the stored value actually is. Insert an "Output Debug Info" Command (found under the category "Basic Commands") after the "Find Image" Command and set the Property "Content of output" as objPoint. Take care to not enclose objPoint in double quotes: this would cause Laiye RPA to simply output the string "objPoint". See Figure 70.
Assuming that the image we are searching for is found, we obtain the following result after running the process:
{ "x" : 116, "y" : 235 }
The specific value of x and y you see may be different on different computers, but the overall format remains the same.This value is a "dictionary" type data. When this value is stored in the variable objPoint, we can write objPoint["x"] and objPoint["y"] to get the value of x and y.
Now that we have obtained the coordinate of the center of the image, we can simulate a mouse click on the location. Simply use the Move Cursor and Simulate Click Commands to complete this task.
As shown in Figure 71, the most important Properties of "Move Cursor" Command are "X-coordinate" and "Y-axis". We can simply enter objPoint["x"] and objPoint["y"] for the two fields. Then, we can add a "Simulate Click" Command to left-click on the center of the button "LOGIN".Thus, we have simulated the action of clicking on the "LOGIN" button.
The three commands in figure 71 are easy to follow. Even someone who has never used Laiye RPA before can understand what they are doing. However, taking three Commands to click a "LOGIN" button is too cumbersome. Luckily, if you inspect all the Image Commands again, you can find a "Click Image" Command, which actually combines "Find Image", "Move Cursor", and "Simulate Click". Instead of what we just did, you can simply insert the "Click Image" Command and use its Selector to box the "LOGIN" button to enable Laiye RPA to find and click the button. Even though this is a Command without target, it is just as easy to use as a Command with target.
Having learned from this example, you should be able to quickly figure out what the other image Commands (like Hover Mouse on Image, Determine If Image Exists, etc.) do and how to use them. Let's leave it to you to try them out.
Practical Skills
In Chapter 3, we learned to use Commands with target, and in this chapter we learned to use Commands without target.In most cases, Commands without target do not operate on a fixed location on the screen. Instead, they are combined with Image Commands to dynamically identify where to apply the operation on the screen.
Therefore, when creating a process, should we prefer to use Commands with target or Commands without target? Our answer is that you should always use Commands with target as long as you are able to select the intended interface element as target. The reason for this is because Commands without target rely heavily on Image Commands, which have the following disadvantages:
- Image Commands are much slower than Commands with target.
- When the intended interface is covered by other windows—even if only a small part of it is covered—the accuracy of Image Commands is reduced greatly.
- Image Commands often rely on image files and cannot run if the files are lost.
- Some special Image Commands require internet connection to run.
Of course, there are some tips that can help with these shortcomings.
First of all, keep your images "small". When taking a screenshot during target selection, try to select the minimal distinguishing parts of the interface element, so long as the selection can show the basic characteristics of the interface element. Minimize the selected area.Not only does this help the Image Command run faster, but it also makes occlusion less likely to affect the Command's accuracy. For example, in Figure 72, instead of selecting the whole button, like the image on the left, you can select the most important distinguishing aspect, which is simply the text "Sign In".
In addition,most Image Commands support the Property "Similarity".The initial value of this Property is 0.9. If it is set too low, the Command could match with "incorrect elements"; if it is set too high, the Command could "miss" the correct element. This is analogous to the "wrong selection" and "missing selection" in Commands with target (refer to Chapter 3). Just as before, we can adjust the value of Similarity according to the specific situation and test out different values to choose an optimal value to balance accuracy and specificity.
Moreover, the computer's screen resolution and scales may have a very critical effect on an Image Command. The interface display of a software program often looks completely different on screens with different resolutions, often resulting in Image Commands that have worked on one computer not working in others. Therefore, please ensure that the computer used to create the process has the same resolution and scale as the computers that run the process. Figure 73 shows the Windows 10 interface for setting system resolution and scale.
Finally, Image Commands often need to reference image files. While we can certainly provide absolute paths to these image files, such as D:\1.png , but this requires the computer that runs the process to have the same file in the same location. Otherwise, an error would occur. To make image dependencies easier to manage, every process's directory has a folder named "res", in which you can put images or any other files the process might depend on. Then, you can reference these files using strings like @res"1.png". When you send this process to an Laiye RPA Worker to run it, the files in the "res" directory will also be packaged. No matter where Laiye RPA Worker puts this process directory, it will always automatically modify what the @res prefix represents, and the file references will always be valid.
All the tips for using Image Commands also apply to OCR Commands. We will introduce what OCR Commands are and how to use them in a later chapter.
Computer Vision
As discussed earlier, the Selector of Commands with target cannot identify interface elements in virtual machines, remote desktops, DirectUI-based software and games directly. In this case, Commands without target come into play. For instance, Image Commands can assist in simulating operations on interface elements indirectly. Besides, Computer Vision Commands are available in Laiye RPA Creator 5.0 and versions above, and this is another kind of method to identify interface elements based on images. Let us use an example to illustrate their specific usage.
Open the system-provided program "Paint", and draw a rectangle here, as shown in Figure 74. Assume that we are required to find and click this rectangle with Laiye RPA.
As mentioned, this cannot be realized using Commands with target. Now let's try Computer Vision Commands to fulfill this task. Find "Computer Vision" under "Interface Operations" and click to unfold. Then the list of Computer Vision Commands will be presented, as shown in Figure 75.
First, insert the Command "CV Screen" and click the button "Selector". Similar to the case of Commands with target, the interface of Laiye RPA Creator will be temporarily hidden, and a blue translucent mask with red border appears. The target selector follows the mouse.
After capturing this screen-wide image, insert a "Click After Recognition" Command and click its "Selector" button, which functions the same way as those of Commands with target. To our surprise, Laiye RPA succeeded in recognizing the rectangle!
That is to say, the Command "CV Screen" helps to extract elements of the interface that cannot be recognized before, so that users can conduct further operations of these elements with subsequent Commands: "Click After Recognition", "CV Get Text", "CV Input Text", "CV Hover", "CV Detect Element" and so on. These Commands would play their parts only after the Command "CV Screen" and must be conducted within the recognition area of the Command (within the indentation of the Command "CV Screen").
Run this process and the rectangle has been clicked successfully.
But what if there are two or more interface elements with identical appearances in the user interface (UI)? How to locate the one we want with Laiye RPA? Open the program "Paint" and copy that rectangle so that there are two identical rectangles. Assume that we are required to find and click on the right rectangle with Laiye RPA.
First, re-click the button "Selector" of the Command "CV Screen" to conduct UI detection again since elements there have changed. To some extent, the Command "CV Screen" is static rather than dynamic, which has to be preset before running. Once changes happen to those interface elements, reconducting UI detection is needed to ensure successful running.
Next, re-click the "Selector" button of the Command "Click After Recognition". Both rectangles there are in a selectable state. Select the right one and it will be covered by the translucent mask. Meanwhile, a dashed line appears to connect the right rectangle with the word "Shapes", as shown in Figure 80.
Actually, "Anchor" is involved during this process in Laiye RPA to locate the element among two or more elements identical in appearance. "Anchor" works to have a unique element on the screen such as the "Shapes" in this example, functions as a reference object to locate the target element through the unique position offset and azimuth angle relative to the "Anchor".
Run this process and the right rectangle has been clicked successfully.