World of tanks
Go flexible
I just thought something more suitable has to exists and came across Sikuli. It looked promising at the beginning and after few months I still couldn’t find any significant disadvantages that would force me to change the tool to other one. It is working and producing results. The conclusion is, it is worth at least to try!
Actually there are 2 versions of Sikuli which are developed independently of each other: http://www.sikulix.com/ (http://www.sikuli.org/) and http://slides.sikuli.org/ (present in Maven repository). When I refer to Sikuli I always means the first one.
The architecture of the solution looks like this. As you can see you can both use Jython and Java to write scripts. However the main effort of the page is focused on Jython API. I discourage you from staying with this API. There is also bundled editor which allows you to rapidly see how it works. Again, do not use this for extended period of time unless you just check image recognition (it is really good for the latter if one needs to adjust the pattern similarity). Try the editor, write simple Jython scripts and leave it. This is really a flexible solution but in my opinion you have to switch to Java.
Jython vs Java with Sikuli
Why is that, I discourage from using Jython? It is a nice object oriented scripting language interpreted to byte code which is run on Java virtual machine. Well, here we go: “interpreted”. Every single run of the test requires interpretation. And it takes time which we would like to use for something more useful than waiting. Let alone my regression suite I was using at work was starting nearly 20 seconds! One can also say Jython is more readable than Java. I can agree it is cleaner but there is a price you pay which in my opinion is just too high: you loose possibility to integrate seamlessly with your Java project (if this is your technology), you will make developers and managers be more reluctant to start using the solution, you will have problems when trying to use some BDD framework on top of it. When your test become more complicated you will have to reach out to Java libraries anyway like I had to do when I was writing test case which included activities over ssh. There are minor issues with Jython as well like for example running Sikuli on secondary display while in Java it works perfect. Anyway, if you know Jython only and have Java nightmares stick to Jython, otherwise go for Java. The flexibility which comes along is awesome in my opinion.
Simple example
Let’s start with the simple example. This is fully functional example demonstrating the usage of Sikuli which will run if you have Sikuli and Notepad++ available on your computer. In this example Notepad++ is started, cursor is moved to the center of Notepad++ window and after 1 second pause text is written to the tab. I marked lines which are related to Sikuli libraries with yellow colour.
package com.passfailerror.examples.simplest; import org.sikuli.script.App; import org.sikuli.script.FindFailed; import org.sikuli.script.Region; import java.util.logging.Logger; public class Start { public static void main(String[] args) { new Start(); } public Start(){ Logger my_logger = Logger.getLogger(this.getClass().getName()); my_logger.info("start"); new App("C:\\Program Files (x86)\\Notepad++\\notepad++.exe").open(); String title = ("Notepad++"); new App(title).focus(); Region windowArea = App.focusedWindow(); windowArea.highlight(3f); try { windowArea.hover(windowArea.getCenter()); Thread.sleep(1000); windowArea.type("Sikuli is working"); } catch (FindFailed findFailed) { findFailed.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } }
Diving into details
The main idea that drives Sikuli is this is image based concept. That is one needs many small screenshots of the application to let this testing framework run. Thus it operates more on user level of abstraction and it is closer to the human perception.
The other important point is it is not a “record-replay” tool. You do not record anything, you just write tests like you would write any regular application. By that mean you are test driven developmnent (TDD) and (behaviour driven development) BDD ready!
Let us look closer at how we can apply Sikuli to automate popular GUI Notepad++ application in Windows.
I am going to use version 6.8.3 in Windows 7 Home Premium version.
The main concept
The main concept can be illustrated by this drawing:

The idea behind the framework is to be able to use meaningful “pieces” which use Sikuli in the background to simulate any action user can do in the application. They should have names close to english language, should present consistent grammar and possibly not have any parameters so that no free text is entered by the user.
package com.passfailerror.notepadplusplus.runner; import java.util.logging.LogManager; import java.util.logging.Logger; import com.passfailerror.notepadplusplus.controllers.MainWindowControl; import com.passfailerror.notepadplusplus.TestDriver; public class Start{ MainWindowControl obj; public static void main(String[] args) throws Exception { new Start(); } public Start() throws Exception{ Object a = getClass(); LogManager.getLogManager().readConfiguration(Start.class.getClassLoader().getResourceAsStream("config/logging_configuration.properties")); Logger my_logger = Logger.getLogger(this.getClass().getName()); my_logger.info("start"); TestDriver testDriver = new TestDriver("C:\\Program Files (x86)\\Notepad++\\notepad++.exe", "Notepad++"); Thread.sleep(1000); testDriver.mainWindowControl().menuControl().fileDropdownControl().newTab(); testDriver.typeKeys("Sikuli can automate every kind of GUI."); testDriver.shortcutControl().closeTab(); testDriver.shortcutControl().newTab(); testDriver.typeKeys("Sikuli automates GUI using Java."); testDriver.shortcutControl().closeTab(); testDriver.mainWindowControl().menuControl().fileDropdownControl().exit(); } }
As you can see it is very clear which user actions are simulated.
The main idea to use Sikuli is that when it starts, the framework should recognize the main window region and smaller regions so that images can be found quickly as well as to provide maintainability of the scope and context for actions in the code properly. Each region is handled by controller
Class overview
From the code point of view we should remember about reasonable design to allow easy development of the framework as well as its maintanance.
If you are interested in details please take a look yourself at the source code. I am sharing it here under GUI_automation_part1 branch.
Look and feel
Please watch the movie which shows the actual run. One can see red rectangles there which show the regions Sikuli is using.
Short summary
In the article I showed you how to simulate user actions on GUI which is not web based one. It was much about Java however this is the price we have to pay to have the exact testing framework we need. In the real project a help from developer could be required to make the framework correct from coding point of view.
Please note I was just showing the possibility of simulating user actions. I was calling the framework “testing framework” few times but I did not say anything about how to write automated test cases, nor about how to run them. This would be too much for the single article. Still I plan to continue the subject and come back to this in the near future. Stay tuned…