Guilhermesilveira's Blog

as random as it gets

Posts Tagged ‘xp

When I am unable to TDD…

with 12 comments

It has been a while since we started using unit tests (and other types of tests) in our projects. But test driven design has always been something that once in a while I feel unskilled enough to do from start!?

Many people (including myself), for many reasons, that TDD is the way to go… but what happens when I have no clue on what I am building?

In the last 3 open source projects that I have worked at, only one started with TDD since its conception. 2 of those projects are actually tools (TestSlicer and VRaptor) while the other one is an continuous integration server.

The first project is about running integration builds faster by running only the required tests. In other words, it should only run the tests affected by the change log.

The problem with creating this tool is that, while coding it for the first time, it is so unclear how it will work or what it will exactly do that it was impossible to test it prior to creation. The first attempt was to use TDD and some code was created. After a few days, it was clear that the way the tool was going to achieve its purpose was way unclear in order to create integration tests for it. Some days afterwards everything was even more clear:

  • unable to keep up coding it due to the lack of more advanced tests
  • it was possible to create such a tool

After the first version was used in production, the conclusion was that it was a great approach to use and drop and re-use TDD in this project, because the idea was way so unclear that it would require anyway a complete coding of the project from scratch – again. Due to the very early stage of the project and Its purposes and ideas evolving too fast in a short period, it felt/was counter-productive to keep tdd-ing.

VRaptor started from scratch with TDD and went just fine. We all knew its purpose and had a somehow clear vision of what we desired (a refactor friendly framework), not knowing exactly how to implement it – but in the end, achieving it. TDD win.

The third project suffered from the same problem that the first one had. We just had a short (unclear) glimpse of what we wanted: “run all our tests in parallel” instead of “running our builds in parallel“. But how?

Should it be the job of our agent machines to consume what our servers make available? Or should the servers manage the agents (as cruise) to do their job? Should it be implemented through low-level sockets or http based resources? Everything was so unclear and changed so fast in the first couple of days that it was impossible to test first, code afterwards at that time.

After the first trial on a private project, it was clear how to and even more clear what we wanted to achieve, so it was time to refactor and start TDD’ing.

This is the common feeling that I have found about TDD bugging people</a… whenever your project is a prototype to check that something is possible of doing, or you are just creating something completely new that you have no idea what it is, it sounds you should first create the prototype, throw it away and restart it with TDD.

Maybe typical web-based app’s won’t suffer of this problem because sprint plannings will help getting things clear in the developer’s mind. But developing a library or a tool for other developers it not the same type of task. At least during the first few moments…

Written by guilhermesilveira

August 24, 2009 at 10:00 am

Posted in agile, Uncategorized

Tagged with , , ,

To break or not to break? Java 7?

with one comment

There is a short slide show to illustrate some thoughts. There will be better ones in the near future.

When is the right timing to break compatibility of a public api regarding its previous versions?

Well, in the open source communites there is a common sense that a library is allowed to cause some migration if there is a minor change (i.e. 1.1.5 to 1.2.0).

Whenever has a major change (i.e. 1.2.0 to 2.0.0) it might be completely rewritten in such a way that even allows its users to adopt both versions at the same time.

Some projects use the version number as a marketing technique in order to keep themselves up-to-date with their competitors.

Some products are famous for, whenever a new release appears, breaking compatibility with code written so far, requiring all programmers to rewrite part of their code. If you check Visual Basic’s life, every one-to-two years there was a major release with (usually) incompatibility.

VB programmers were used to that issue and kept coding the old projects using the previous release until the project was finished. Companies just got used to it and learned how to live on.

If your code is well tested through the use of automated tests, updating a library or a compiler/language version is an easier task because all incompatibility issues will be found prior to packing a new release of your product to your clients. testing++!

If you do not write code for your tests, as soon as you update a library/compiler/language, well….. have fun as it will probably be an unique adventure.

The java developers

Unfortunately, there is still a big part of the java community who do not write automated tests. Aside with the java legacy code that exists lacking any lines of automated tests in the world, sticking to compatibility between java releases can be seen as a good thing for the world, in general.

But for those who already write their tests, all that care with compatibility might be seen as a overestimated issue: due to the tests, we are ready to change and embrance those changes.

The java 7 crew is aware that there is a lot more that we can add to the language, but afraid because it will not preserve high levels of compatibility and usability.

What happens if the language has to worry so much about compatibility? It will evolve so slow that other languages have the chance to overcome it. This is the danger that the language itself faces. Java might not lose its position but one can find a lot more people arguing about language changes that could be done to the language – but are not… because preserving compatibility has been a main issue.

At the same time, some of the changes proposed might create huge incompatibility issues for those users who still do not write tests or that software who was not written using TDD practices. There is also another document on the same issue on the internet.

This proposes that methods defining a void return type should be considered as returning this. This way one can easily use it with Builder-patterned apis:


final JFrame jFrame = new JFrame()
.setBounds(100, 100, 100, 100)
.setVisible(true);

There are a few issues with that proposal that do not match the “let’s keep it backwards compatible” saying.

The first thing is that, nowadays, builder (or construction pattern) apis are already created returning something specific instead of void, as we can see from Hibernate‘s API, and the new date and time api.

The second point is that builders are used nowadays to create Domain Specific Languages and their implementation in Java do not use a single object because it would create a huge (and nasty) class. DSL’s are usually built in Java by using different types, i.e. the criteria API from Hibernate.

Even the given example is actually no builder api… JFrame configuration methods and real-time-usage methods are all within… itself! There is no JFrameBuilder which would build a JFrame, i.e.:


Builder b = new Builder();
b.setTitle("Title").addButtonPanel().add(cancelButton()).add(okButton());
JFrame frame = b.build();

Notice that in the simple example above, it would be a good idea to have two different types (Builder and PanelBuilder) thus the language modification do not achieve what we want our code to look like (or be used like). Instead, it will only allow us to remove the variable name from appearing 10 times in our code, making it easier for programmers to write lines of code like this:


// ugly line which has too much information at once
JFrame frame = new JFrame().setA("a").setB(2,3).setC("c").setD("d").andOn().andOn().andOn();

But why does it go against Java’s saying ‘we should not break compatibility’? Because it creates a even higher degree of coupling between my code and the api i am using.

Well, imagine that I used the swing api as mentioned above. In a future release of Swing, some of those methods might have their signature changed and therefore break my existing code. Why would an api change their method return type? Well, because if the return type was defined as void so far, no one was using it… so I can change it.

It creates the same type of coupling found while using class-inheritance in Java while using APIs. Parent methods being invoked might change their signature

Well, it was true until today. If this functionality is approved for the Java API, it will make a simple task of changing a “void” return type to something useful a hard task, where I have to think about those who have tightly-coupled their code to mine.

The questions and answers which come to my mind are…
a) is the existing Java codebase around the world usually automated tested? unfortunately, no
b) does Java want to be backward compatible? this change will not help it
c) does it want to help the creation of dsls? this change is not the solution
d) does Java want us to avoid writing the variable name multiple times? my IDE already helps me with that

Written by guilhermesilveira

August 17, 2009 at 1:24 pm

Continuous integrating: parallel tests the way they should be

leave a comment »

We at Caelum have incorporated a lot of XP practices in our every day life for a while now. Still, we have faced a lot of issues with making the software one-click-deployable.

I will talk about the issues that arised in one of our continuous projects – that kind of project which has no end because the project is the heart of its company and as the company evolves, it should keep being developed.

The first big – and long lasting – problems were due to the build time taking much longer than the expected 10 minutes. Selenium end-to-end tests tend to be slow if run in a sequential way, therefore we adopted selenium-grid, which offered a tool to parallel run our tests. After a few months struggling with it, me and Lucas Cavalcanti submitted a patch to the tool where the user was able to control the grid-client machines in order to be able to cope with machine shut-down, and new agents config with a web front end. Unfortunately that patch was not accepted because the Selenium Grid project was going in a different direction from our necessity.

From that point on we noticed that our main problem was to parallelize the tasks, and that could be achieved with the Continuous Integration Server instead than with Selenium Grid. The most famous open source servers did not offer exactly what we expected, some of them offering just a way to distribute builds, not to parallelize them. This wouldnt make our build any faster, but only allow more (distinct) builds to run concurrently.

There is, of course, Cruise, which does exactly what we needed – parallelize the tests for us – but the client company did not want to spend money for that reason at that point, so an open source approach. Our decision was to create our own CI server, and face all the problems other teams have faced while doing that.

Right now, Integra – this new continuous integration server – has GIT and SVN support, all that email and tasks features and the most important thing at all for us.

Automatically Parallellizing Tests

Cruise goes a long way when it allows us to declaratively annotate our tests in groups and run each group of tests in a different agent at the same time. Integra has implemented the same idea but went a little bit further: instead of making you responsible for categorizing your tests in N different categories, and running it in N machines, Integra allows us to simply let it know the test set and the maximum time that we want our tests to be run.

Let’s say, I want my integration tests to be run in x=5 minutes, you can configure Integra’s test plugin with this number, it will – in its first run – just randomly run the build in a number of machines. After this first run, Integra knows the average running time of each test and can therefore create T(x) partitions in order to achieve a maximum running time of x. It’s clear that the number of partitions T(x) depends on the running time of each test and an eager algorithm was implemented that minimizes the difference between x and max(running_time_of_a_single_test).

Restrictions

If max(running_time_of_a_single_test) > x, we can’t achieve your goal. Another restriction might be the number of agents available being lesser than T(x).

Eager algorithm to partition the tests

The eager algorithm that we have implemented seems to give the best solution which minimizes what has been commented. Although I have convinced myself of it with a “during-the-shower proof” in my own mind, its probably time to put it down on paper to be sure about it.

Anyone willing to improve the algorithm?

Summing up, human tests should be automated to its maximum. Thats what we have been talking for a while now, and this test’s partitioning job – currently an human task – could have been automated, and now it is.

We intend to release Integra as soon as I get the html page design later next week because right now its as ugly as… anything else i can design.

Written by guilhermesilveira

July 1, 2009 at 5:53 pm

Posted in Uncategorized

Tagged with , , ,