Guilhermesilveira's Blog

as random as it gets

Posts Tagged ‘cache

Scaling through rest: why rest clients require cache support

with 2 comments

It’s common to find developers struggling with their clients browser’s cache and proxies in order to get their application running as expected: some of them actually view cache options as a bad thing.

Actually http caches presents a few advantages, being the two most important amongst them all the ability to serve more clients at the same time without buying more expensive hardware (or horizontally scattering your system) and avoiding excessive bandwidth consumption where it can be saved or it is expensive.

A well known tutorial on how web caches work
was written by Mark Nottingham. Mark has also been involved with the Link header specification and developed Redbot, a clever machine that inspects your pages to avoid cache related issues you might be facing or improve your application scalability: and its everything connected to rest architectures.

Linked data is the basis for HATEOAS systems while http cache supports higher scalability using such architecture.

Imagine a theoretical scenario where a huge content provider application contains hundreds of thousands of articles that are frequently accessed in your country. Such application might have a few pages that change often, while others do not.

By adding a simple “Cache-control” header to your page, all existing cache layers between you and your client monitor will hold the resource representation in memory for one hour:


Cache-Control: max-age=7200

In Restfulie (Rails) it can be achieved by providing some cache information to your resource:


class OrdersController < ApplicationController
cache.allow 2.hours
end

Now there can be three cache systems leveraging from such header example.

The browser’s cache will use the previously retrieved representation while it does not expires, and might use it even if its expired and you did not provide the must-revalidate option. This will save you bandwidth and server cpu consumption.

A cache proxy situated within the users network, or anywhere between the server and the client machine, will serve the previously retrieved representation, saving you bandwidth outside your network and server cpu consumption.

A reverse proxy can cache the representation within the server’s network and save cpu consumption. This approach has been widely adopted in order to share cached representations amongst different consuming applications/users.

All these three savings are actually reverted in a easier to scale application, you did not need any paid middleware, any fancy stack or load balancers, although they might help: it saves you complexity, time and money.

There is much more you can do with cache headers (Last-Modified, ETag and so on) and REST libraries should make it easy to use them, appart from supporting local caches.

Finally, in syndication based systems, or in any other heavy machine-to-machine communication based one, a local cache might not be able to handle the large volume of caching hits. In such systems, it is a common approach to use distributable cache systems, and Restfulie allows you to use your own cache provider.

For example, a distributed cache like Memcached could be used by simply implementing three methods:


class MemcachedCache

def put(url, request, response)
# save it into memcached
end

def get(url, request)
# retrieves from cache, if available
end

# optional implementation
def clear
# clears the cache
end

end
Restfulie.cache_provider = MemcachedCache.new

Most used http clients implement low level features such as handling response and requests on your own, processing only the basic request headers.

The difference between http client libraries and rest client libraries is that the second should implement further http api processing, while the first allow access to the previously mentioned low level api.

And both because cache is part of the HTTP api and one of the key issues that made the web scale as we know it, Restfulie required such support out of the box (along with etag, last-modified and 304).

Not only one write less code to process the responses, but one leverages his client and server applications.

Note: I am moving my posts to our company’s blog, the next post will be just an announcement. Comments can be made either here or there.

Written by guilhermesilveira

January 26, 2010 at 8:00 am

Posted in restful

Tagged with , , ,