Guilhermesilveira's Blog

as random as it gets

Posts Tagged ‘appengine

Fractals in cloud computing with Google App Engine

While studying math methods as a minor in my applied math course, I had a mentor, Eduardo Colli, who helped me teaching the math requirements to implement a desktop based discrete dynamical systems software, Pulga. In the dynamical systems area – usually non-linears – people started talking about caothic behaviour of particles and one of the most famous systems is the Lorenz attractor.

atrator de Lorenz

As most of those algorithms can be run concurrently, we have decided to send it to Google App Engine in order to test it in a highly demanded way, enjoying the benefits of running in a cloud enviroment to generate results quicklier, as in a cluster, but using the http and internet infrastructure as a cluster: Google App Engine would help not only with scalability and availability but also improving performance. The basic algorithm iterates a billion+ times through some mathematical functions. In the traditional, desktop, approach, this is run in a single thread, in a code which resembles:


for(int time = 0; time < 60*60; time++) {
  for(int initial=0;initial < 1000; initial++) {
    x0, x1 = intiialCondition[initial];
    for(int iteration = 0; iteration <= 100000; iteration++) {
      x0, x1 = calculate(x0, x1);
    }
  }
}

The code above is much simples than the original source, but illustrates how slow the software can be. A simple execution would be finding an attractor’s basin of attraction, a task which would take between 2 to 5 minutes to draw the Henon map.

In our machines, Henon’s bifurcations diagram, as the above image from wikipedia, would take 5 minutes:

diagrama de bifurcação do mapa de henon

At first sight, those results seem to be achieved in a short period of time, but once Henon’s map relies on two parameters, if we wish to create a movie by scanning one of those parameter’s domain (considering it the time), a one minute video would take about 3 hours of computer time. In real situations, with different strange attractors, some movies would take more than 7 hours to be created.

In order to run it in a parallel way, the new application has 3 parts: a client which sends a single request, a server written in ruby which receives this request and dispatches hundreds of requests to the cloud, using the well known http protocol, returning parts of the graphic.

The client will request the server for colors to paint the image through ajax/jsonp. The server app sends all requisitions using non blocking io and could be somehow enhanced through the use of ruby 1.9 fibers. This NIO api from ruby is similar to Java’s NIO.
Once the requests get to the Google App Engine, it will basically run a code quite similar to the original desktop client core. After each execution, those numbers are returned to the ruby server, which concatenates data and generates the final result.

Initial tests have shown response times 51 faster than running on a single client by using 100 parallel http requests to Google App Engine:

100 random points iterations per request

Requests Total time (seconds) Iteration time (miliseconds)
local 6 16.6
1 3 33
10 6.4 156.25
50 14 357
100 17 588
200 23.55 849

Using memcached, Google App Engine’s cache api, we achieved even faster results:


CacheFactory cacheFactory = 
       CacheManager.getInstance().getCacheFactory();
cache = cacheFactory.createCache(Collections.emptyMap());
String result = (String) cache.get(script);
if (result != null) { ... }

This simple modification helped us achieving 93 times faster response running in ~100 machines!
Of course, some machines did not answer properly and the server could deal with that by just using standar http protocol and further requests.

We could improve it even more by using the HTTP/internet provided infrastructure: proxies. By correctly configuring our proxies, many of those requisitions achieved the same task, resulting in absolutely amazing responses. That is why RESTFul webservice’s gurus consider Squid a great choice over a giant ESB system.

hiperboloide

For those who are interested, here are some examples of 2d dynamical system maps which can be easily implemented.

Batch processes and reports are tasks which usually take some long time to run and many of those could be breaken in many parallel processes.We, developers, will soon be running all our code in those cloud systems: not only in order to achieve higher and higher scalability and availability goals, but because this elastic infrastructure allows hosting companies to have less costs. Knowing how to profit from those systems is the role of a developer. Non-relational databases and languages with different approaches to concurrent programming than the traditional java/pthread one, are showing to be excelent ideas to chase.

Written by guilhermesilveira

September 4, 2009 at 5:27 am

Posted in java, mathematics, ruby

Tagged with , , ,