I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 635 posts at DZone. You can read more from them at their website. View Full User Profile

The absolute minimum you'll ever have to know about session persistence on the web

07.15.2010
| 27071 views |
  • submit to reddit

What is the definition of session persistence? For instance, it means recognizing an user as the same one that has compiled a login form before. Technically speaking, it means identify a client in between different HTTP requests.

Session persistence is different from data persistence (for which we use databases, files, and ORMs) since the thing to maintain is not the state of the application, but of the interaction with a particular user. Session persistence is usually enough if it lasts for the time of a single usage (from minutes to hours), thus a local storage on a web server can be employed (usually RAM).

Parameters like the ip address of the client are not reliable for this recognition, and as we'll see, every web application has to craft a custom approach to session persistence since the HTTP protocol does not offer such a facility. Even HTTP-based authentication is repeated at every interaction (resending password), but it is not as widely used as a cookie-based approach.

HTTP

HTTP is a stateless protocol - but it gives us workarounds to maintain the identity of clients during a connection.

Stateless means that each HTTP request, taken as-is, is independent of the previous and next ones. There is no state of the connection embedded in the protocol like with TCP sequence numbers: in fact, typically a connection is opened when you request a page with your browser and closed when the operation is completed. The stateless nature of the protocol is what makes proxies work, since they can easily cache idempotent GET requests.

Actually it's not so simple: the Keep-Alive header, which became a default in HTTP 1.1, prescribes not to close the connection after a response has been sent, and makes the server able to deliver multiple resource in sequence (like the images of a web page) with the overhead of a single connection. Still, a new connection is created at every click.

There are various ways to instill a bit of state into HTTP interactions, by providing the client with a parameter that can be passed back to the server in order for it to recognized the user. This parameter can be for example embedded in the Url, or it can be set via a specialized mechanism which was introduced by Netscape in 1995, five years after the original HTTP specification: cookies.

Cookies

Cookies are supported by every modern browser, with the recognition of the headers Cookie (in requests) and Set-Cookie (in responses).

A cookie is a little string value, with an identified name, which will be passed back as a request header at every new connection, until its expiration time has been reached. Cookies are host-specific (every website has its own set maintained on the client), and path-specific (can be used by pages only in a particular subdirectory of the server.)

In Java servlets, you can easily handle cookies via an object model:

// response instanceof HttpServletResponse
Cookie myData = new Cookie("fancyName", "value");
response.addCookie(myData);
// the cookie will be available from the next request
Cookies[] cookies = request.getCookies();

In a PHP script, the setcookie() function describes every characteristic of cookies in its signature:

<?php
setcookie("TestCookie", $value); // expires when browser is closed
setcookie("TestCookie", $value, time()+3600); // expires in 1 hour
setcookie("TestCookie", $value, time()+3600, "/folderWhereCookieIsValid/", ".domainwherecookieisvalid.com");

The $_COOKIE superglobal array will contain the values of all cookies from the subsequent request.

There are two options when using cookies for maintaining the state of an interaction: storing raw data or storing the keys of a server-side data structure (like a database table). Cookies can be forged easily, so don't store in them info that could be modified (user permissions or nickname) without a further mechanism of identification (a one-time password for example).

If you're struggling with the transmission of raw data via cookies, maybe you should take your infrastructure to the next level, and start using sessions.

Sessions

Session management stores a unique id in a conventionally named cookie, and then use the value transmitted by the client to access a big hash table where every session id has its own array of data.
In the past session ids were passed in the url because not every client had cookie support, but now the risk the user tweets his url containing a session id is greater than finding someone with a browser that does not support cookies.

If you have multiple servers, the session state management will become complex since either the session storage is shared between HTTP servers (as an external node) or the load balancer must send every user to the same server, basing on the IP or some other request parameter.

Session hijacking is the practice of guessing the session id of another user, typically by having him sending the cookie to an external server via an attack executed on the client-side.

In a Java servlet, the request object will give you access to the HttpSession object:

// request instanceof HttpServletRequest
HttpSession session = request.getSession();
session.setAttribute("name", "value");
// remember that casting when you get it back
String myAttribute = (String) session.getAttribute("name");

In PHP, another superglobal array manages session variables:

 <?php
session_start();
$_SESSION['key'] = $value; // yes, it's that simple
// you can also use an higher-level abstraction like Zend_Session
The higher level abstraction over PHP's session management is useful when it comes the time to do some acceptance testing, as it provides something to mock or to put in "testing" mode (an hack, but Zend_Session works like this.) Without a level of indirection, it would be very difficult to test more than one request in the same script.
Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Manuel Jordan replied on Thu, 2010/07/15 - 8:33am

Sessions in PHP has some problematic control

Eyal Golan replied on Thu, 2010/07/15 - 9:36am

Or you can always use Apache Wicket :)

Giorgio Sironi replied on Thu, 2010/07/15 - 10:05am in response to: Manuel Jordan

A layer of insulation over $_SESSION is always advised.

Manuel Jordan replied on Thu, 2010/07/15 - 7:38pm in response to: Giorgio Sironi

 

Can you expand the idea please?, or some link of reference?

Thanks in advanced

Alessandro Santini replied on Fri, 2010/07/16 - 6:24am

Let me add a couple of points about session management in Java (PHP should not be here, this is JAVAlobby):

  • Session objects should always be Serializable - this because  appservers replicate session data across clusters;
  • Cookies are not the only way of identifying a session - URL rewriting is another.

Giorgio Sironi replied on Fri, 2010/07/16 - 10:28am in response to: Alessandro Santini

Your suggestions are welcome, although url rewriting is already cited in the article (not as my favorite).

Articles on DZone are published on multiple zones - this one wa not only targeted to the Java one.

Andy Leung replied on Fri, 2010/07/16 - 2:25pm in response to: Giorgio Sironi

Unfortunately, this is really just for Java. If you look at the sub-domain.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.