Continuous integrating: parallel tests the way they should be
We at Caelum have incorporated a lot of XP practices in our every day life for a while now. Still, we have faced a lot of issues with making the software one-click-deployable.
I will talk about the issues that arised in one of our continuous projects – that kind of project which has no end because the project is the heart of its company and as the company evolves, it should keep being developed.
The first big – and long lasting – problems were due to the build time taking much longer than the expected 10 minutes. Selenium end-to-end tests tend to be slow if run in a sequential way, therefore we adopted selenium-grid, which offered a tool to parallel run our tests. After a few months struggling with it, me and Lucas Cavalcanti submitted a patch to the tool where the user was able to control the grid-client machines in order to be able to cope with machine shut-down, and new agents config with a web front end. Unfortunately that patch was not accepted because the Selenium Grid project was going in a different direction from our necessity.
From that point on we noticed that our main problem was to parallelize the tasks, and that could be achieved with the Continuous Integration Server instead than with Selenium Grid. The most famous open source servers did not offer exactly what we expected, some of them offering just a way to distribute builds, not to parallelize them. This wouldnt make our build any faster, but only allow more (distinct) builds to run concurrently.
There is, of course, Cruise, which does exactly what we needed – parallelize the tests for us – but the client company did not want to spend money for that reason at that point, so an open source approach. Our decision was to create our own CI server, and face all the problems other teams have faced while doing that.
Right now, Integra – this new continuous integration server – has GIT and SVN support, all that email and tasks features and the most important thing at all for us.
Automatically Parallellizing Tests
Cruise goes a long way when it allows us to declaratively annotate our tests in groups and run each group of tests in a different agent at the same time. Integra has implemented the same idea but went a little bit further: instead of making you responsible for categorizing your tests in N different categories, and running it in N machines, Integra allows us to simply let it know the test set and the maximum time that we want our tests to be run.
Let’s say, I want my integration tests to be run in x=5 minutes, you can configure Integra’s test plugin with this number, it will – in its first run – just randomly run the build in a number of machines. After this first run, Integra knows the average running time of each test and can therefore create T(x) partitions in order to achieve a maximum running time of x. It’s clear that the number of partitions T(x) depends on the running time of each test and an eager algorithm was implemented that minimizes the difference between x and max(running_time_of_a_single_test).
If max(running_time_of_a_single_test) > x, we can’t achieve your goal. Another restriction might be the number of agents available being lesser than T(x).
Eager algorithm to partition the tests
The eager algorithm that we have implemented seems to give the best solution which minimizes what has been commented. Although I have convinced myself of it with a “during-the-shower proof” in my own mind, its probably time to put it down on paper to be sure about it.
Anyone willing to improve the algorithm?
Summing up, human tests should be automated to its maximum. Thats what we have been talking for a while now, and this test’s partitioning job – currently an human task – could have been automated, and now it is.
We intend to release Integra as soon as I get the html page design later next week because right now its as ugly as… anything else i can design.