Posts Tagged ‘http’
Rest is not only about http and the web, but an architectural style derived from the web and others.
In this 20 minutes video, we will see how the web has been capable of scaling, providing different services and being a more effective system than other human distributed systems alike (i.e. water distribution and electricity).
We move on to describing the basic characteristics of a rest architecture and how it leverages your system.
This is the first video on the Rest from Scratch – Theory. You can watch the other Rest From Scratch – Practice videos online here.
Not yet REST
How do we achieve REST? Leonard Richardson’s model was widely commented and Martin Fowler posted on “Rest in Practice” (a book I recommend reading). But what is left out from REST in Richardson’s model and why?
According to his model, level 3 adds hypermedia support, leveraging a system through the use of linked data – a requirement for a REST architecture. But HATEOAS alone does not imply in REST, as Roy stated back in 2008.
Remember how method invocation on distributed objects allowed you to navigate through objects and their states? The following sample exemplifies such situation:
orders = RemoteSystem.locate().orders();
receipt = order.payment(payment_information);
But what if the above code was an EJB invocation? If navigating through relations is REST, implementing EJB’s protocol through HTTP would also be REST because linked data is also present in EJB’s code – although lacking an uniform interface.
While Richardson’s model get close to REST on the server side, Rest in Practice goes all way to a REST example, describing the importance of semantics and media type importance. The rest of the post will explain what was left out of this “Rest services” model and why, proposing a model that encompasses REST, not REST under http; while the next post, with a video, describes how to create a REST system.
What is missing?
Did the previous code inspect the relations and state transitions and adapted accordingly?
It did not choose a state transition, it contains a fixed set of instructions to be followed, no matter which responses are given by your server. If the API in use is http and the server returns with a “Server too busy” response, a REST client would try again 10 minutes later, but what does the above code do? It fails.
We are missing the step where REST clients adapt themselves to the resource state. Interaction results are not expected as we used to in other architectures. REST client behavior was not modelled on Richardson model because the model only thought about server side behavior.
This is the reason why there should be no such a thing as “rest web services” or “rest services”. In order to benefit from a REST architecture, both client and server should stick to REST constraints.
Richardson’s server + http model
Semantic meaningful relations are understood by the client, and because of that we need a model which describes how to create a REST system, not a REST server.
An important point to note is that this model is pretty good to show a REST server maturity over HTTP, but limiting REST analysis both to server and http.
A REST architecture maturity model
For all those reasons, I propose a REST maturity model which is protocol independent and covers both consumer and provider aspects of a REST system:
Trying to achieve REST, the first step is to determine and use an uniform interface: a default set of actions that can be taken for each well defined resource. For instance, Richardson’s assumes HTTP and its verbs to define a uniform interface for a REST over HTTP architecture.
The second step is the use of linked data to allow a client navigate through a resource’s state and relations in a uniform way. In Richardson’s model, this is the usage of hypermedia as connectedness.
The third step is to add semantic value to those links. Relations defined as “related” might have a significant value for some protocols, but less value for others, “payment” might make sense for some resources, but not for others. The creation and adoption of meaningful media types allows but do not imply in client code being written in a adaptable way.
The fourth step is to create clients in a way that decisions are based only in a resource representation relations, plus its media type understanding.
All of the above steps allow servers to evolve independently of a client’s behavior.
The last step is implied client evolution. Code on demand teach clients how to behave in specific situations that were not foreseen, i.e. a new media type definition.
Note that no level mentions an specific protocol as HTTP because REST is protocol independent.
The following post will describe one example on how to create a REST system using the above maturity model as a guide.
The most frequently asked question about REST in any presentation: why hypermedia is so important to our machine to machine software?
Is not early binding through fixed URI’s and using http verbs, headers and response codes better than what we have been doing earlier?
An approach that makes real use of all http verbs, http headers and response codes already presents a set of benefits. But there is not only the Accept header, not only 404, 400, 200 and 201 response codes: real use means not forgetting important verbs as PATCH and OPTIONS and supporting conditional requests. Not implementing features as automatic 304 (as a conditional requests) parsing means not using http headers and response codes as they can be used, but just providing this information to your system.
But if such approach already provides so many benefits, why would someone require a machine-to-machine software to use hypermedia? Is not it good enough to write code without it?
The power of hypermedia is related to software evolution, and if you think about how your system works right now (its expected set of resources and allowed verbs), hypermedia content might not help. But as soon as it evolves and creates a new set of resources, building unforeseen relations between them and their states (thus allowed verbs), that early binding becomes a burden to be felt when requiring all your clients to update their code.
Google and web search engines are a powerful system that makes use of the web. They deal with URIs, http headers and result codes.
If google’s bot was a statically coded bot that was uncapable of handling hypermedia content, it would require a initial – coding time or hand-uploaded – set of URIs coded that tells where are the pages on the web so it retrieves and parses it. If any of those resources creates a new relationship to other ones (and so on), Google’s early binding, static URIs bot would never find out.
This bot that only works with one system, one specific domain application protocol, one static site. Google would
not be able to spider any other website but that original one, making it reasonably useless. Hypermedia is vital to any crawling or discovery related systems.
Creating consumer clients (such as google’s bot) with early binding to relations and transitions do not allow system evolution to occur in the same way that late binding does, and some of the most amazing machine-to-machine systems on the web up to date are based in its dynamic nature, parsing content through hyperlinks and its semantic meaning.
Although we have chosen to show Google and web search engines as examples, any other web systems that communicate with a set of unknown systems (“servers”) can benefit from hypermedia in the same way.
Your servers can only evolve their resources, relations and states without requiring client-rewrite if your code allows service-crawling.
REST systems are based in this premise, crawling your resources and being able to access its well understood transitions through links.
While important systems have noticed the semantic value and power of links to their businesses, most frameworks have not yet helped users accomplishing late binding following the above mentioned principles.
It’s common to find developers struggling with their clients browser’s cache and proxies in order to get their application running as expected: some of them actually view cache options as a bad thing.
Actually http caches presents a few advantages, being the two most important amongst them all the ability to serve more clients at the same time without buying more expensive hardware (or horizontally scattering your system) and avoiding excessive bandwidth consumption where it can be saved or it is expensive.
A well known tutorial on how web caches work
was written by Mark Nottingham. Mark has also been involved with the Link header specification and developed Redbot, a clever machine that inspects your pages to avoid cache related issues you might be facing or improve your application scalability: and its everything connected to rest architectures.
Linked data is the basis for HATEOAS systems while http cache supports higher scalability using such architecture.
Imagine a theoretical scenario where a huge content provider application contains hundreds of thousands of articles that are frequently accessed in your country. Such application might have a few pages that change often, while others do not.
By adding a simple “Cache-control” header to your page, all existing cache layers between you and your client monitor will hold the resource representation in memory for one hour:
In Restfulie (Rails) it can be achieved by providing some cache information to your resource:
class OrdersController < ApplicationController
Now there can be three cache systems leveraging from such header example.
The browser’s cache will use the previously retrieved representation while it does not expires, and might use it even if its expired and you did not provide the must-revalidate option. This will save you bandwidth and server cpu consumption.
A cache proxy situated within the users network, or anywhere between the server and the client machine, will serve the previously retrieved representation, saving you bandwidth outside your network and server cpu consumption.
A reverse proxy can cache the representation within the server’s network and save cpu consumption. This approach has been widely adopted in order to share cached representations amongst different consuming applications/users.
All these three savings are actually reverted in a easier to scale application, you did not need any paid middleware, any fancy stack or load balancers, although they might help: it saves you complexity, time and money.
There is much more you can do with cache headers (Last-Modified, ETag and so on) and REST libraries should make it easy to use them, appart from supporting local caches.
Finally, in syndication based systems, or in any other heavy machine-to-machine communication based one, a local cache might not be able to handle the large volume of caching hits. In such systems, it is a common approach to use distributable cache systems, and Restfulie allows you to use your own cache provider.
For example, a distributed cache like Memcached could be used by simply implementing three methods:
def put(url, request, response)
# save it into memcached
def get(url, request)
# retrieves from cache, if available
# optional implementation
# clears the cache
Restfulie.cache_provider = MemcachedCache.new
Most used http clients implement low level features such as handling response and requests on your own, processing only the basic request headers.
The difference between http client libraries and rest client libraries is that the second should implement further http api processing, while the first allow access to the previously mentioned low level api.
And both because cache is part of the HTTP api and one of the key issues that made the web scale as we know it, Restfulie required such support out of the box (along with etag, last-modified and 304).
Not only one write less code to process the responses, but one leverages his client and server applications.
Note: I am moving my posts to our company’s blog, the next post will be just an announcement. Comments can be made either here or there.
In the human web, the entry point is a strong coupled URI.
Changing your company’s entry point from http://www.caelum.com.br to http://www.newcaelum.com.br without notifying your
clients will cause serious damages to those using your website.
The same holds for REST applications. Entry points to your system are strong coupled and
changing it without notifying your clients will break their system usage.
Entry point’s should not change without noticing your clients.
How can my clients become aware of the URI change?
Once you have published your new URI, set your old URI response to “301 – Moved Permanently” and set the header “Location” with
the appropriate location. Once 301 is received your clients can cache this response forever, changing their entry point configuration, because the move is permament.
By adding a proxy server to your system, the Moved Permanently response will be cached.
Restfulie will automatically redirect the user to the new URI upon receiving 301 on GET and HEAD:
product = Product.from_web 'http://www.caelum.com.br/product/2' # received a 301, went to the new Location
If the request is anything but GET or HEAD, the user has to explicetely define its intention on following 301:
# explicitely following 301 on other verbs class Product uses_restfulie entry_point_for.create.at 'http://www.caelum.com.br/product' follows.moved_permanently end product = Product.remote_create Product.new # received a 301, re-posted it to the new uri, received a 200
In the ideal human web, all gettable resources are bookmarkable therefore any changes to those resources also
imply possible damage to your clients. After accessing http://www.caelum.com.br for the first time, if I access a ruby training,
I get to http://www.caelum.com.br/cursos/rr-71. I can use this address in several ways, i.e. bookmarking or sharing it.
If the courses URI’s changes, bookmarked URI’s will not work: the user will only find it out after trying it.
Once he has got a 404, the user will go back to the entry point and restart his procedure to
access the previous (invalid) bookmarked resource if the server does not answer with a Moved permantently response.
That’s what we do when our bookmarks break.
Again, the same holds for REST applications. In order to keep the benefits form Addressability, once published, one has
to keep his URI’s valid forever: either responding with its content, a See Other (303) or a Moved Permanently (301) response.
REST bases itself in addressability and therefore it is ok to bookmark any resource URI in the client system:
if you believe some your resources will not be bookmarked, it is ok to move them without prior notice, but remember, by
thinking that your resources were not bookmarked, you have accepted to lose one of REST’s benefits.
Client’s can always go back to the entry point and navigate through the previous executed procedure to retrieve the new URI,
but if he has to, it means that you broke one of your clients expectations.
302 and 303
Two other possible responses are 302 Found and 303 See Other.
While 302 means a temporary move, the client not supposed to cache the Location header as the new location
for the resource unless explicitely specified. Restfulie deals with it the same way as it does for 301:
you should notify Restfulie to follow 302 responses for verbs other than GET and HEAD>
When receiving a 303, the user should retrieve the resource at the new location with a GET verb.
This response allows the server to respond to a POST request by notifying the client to redirect to
the new resource URI – or any other desired one – with a GET.
The 303 response itself must not be cached.
Restfulie follows all 303 responses by default.
No matter believing or not that your client will not cache your internal URI’s, the HTTP application protocol allows him to do it,
so in order to allow your clients to evolve appart from your own evolution, do not change the addresses
without keeping a proper Moved Permanently (301) response at those URI’s.
Internal element’s patterns might not be shared, but remember: every published resource should answer properly if they move.