Monday, July 9, 2007

Files vs databases for sessions

bool session_set_save_handler ( string open, string close, string read, string write, string destroy, string garbage_collect );

The session-handling system in PHP is actually quite basic at its core, simply storing and retrieving values from flat files based upon unique session IDs dished out when a session is started. While this system works very well for small-scale solutions, it does not work too well when multiple servers come into play. The problem is down to location: where should session data be stored?

If session data is stored in files, the files would need to be in a shared location somewhere – not ideal for performance or locking reasons. However, if the data is stored in a database, that database could then be accessed from all machines in the web server cluster, thereby eliminating the problem. Luckily for us, PHP’s session storage system was designed to be flexible enough to cope with this situation.

Note: Also keep in mind that PHP saves its session data to your /tmp directory by default, which is usually readable by everyone who has access to your server. As a result, be careful of what you store in your sessions, or, better yet, either change the save location or use a database with finer-grained security controls!
To use your own solution in place of the standard session handlers, you need to call the function session_set_save_handler() , which takes quite a lot of parameters. In order to handle sessions, you need to have your own callback functions that handle a set of events, which are:


Session open (called by session_start())
Session close (called at page end)
Session read (called after session_start() )
Session write (called when session data is to be written)
Session destroy (called by session_destroy() )
Session garbage collect (called randomly)

To handle these six events, you need to create six functions with very specific numbers of functions and return types. Then you pass these six functions into session_set_save_handler() in that order, and you are all set to go. Give this next script a try – it sets up all the basic functions, then just prints out what gets passed to the function so you can see how the session operations work:

// php code starts here


function sess_open($sess_path, $sess_name) {
print "Session opened.n";
print "Sess_path: $sess_pathn";
print "Sess_name: $sess_namenn";
return true;
}
function sess_close() {
print "Session closed.n";
return true;
}
function sess_read($sess_id) {
print "Session read.n";
print "Sess_ID: $sess_idn";
return '';
}
function sess_write($sess_id, $data) {
print "Session value written.n";
print "Sess_ID: $sess_idn";
print "Data: $datann";
return true;
}
function sess_destroy($sess_id) {
print "Session destroy called.n";
return true;
}
function sess_gc($sess_maxlifetime) {
print "Session garbage collection called.n";
print "Sess_maxlifetime: $sess_maxlifetimen";
return true;
}
session_set_save_handler("sess_open", "sess_close", "sess_read", "sess_write", "sess_destroy", "sess_gc");
session_start();
$_SESSION['foo'] = "bar";
print "Some textn";
$_SESSION['baz'] = "wombat";

// php code ends here


Running that code through the CLI SAPI on my system I get the following output:Session opened.
Sess_path: /tmp
Sess_name: PHPSESSID
Session read.
Sess_ID: m4v94bsp45snd6llbvi1rvv2n5
PHP Warning: session_start(): Cannot send session cookie - headers already sent by (output started at session.php:3) in session.php on line 39
PHP Warning: session_start(): Cannot send session cache limiter - headers already sent (output started at session.php:3) in session.php on line 39
Some text
Session value written.
Sess_ID: m4v94bsp45snd6llbvi1rvv2n5
Data: foos:3:"bar";bazs:6:"wombat";
Session closed.

Ignore the two lines about being unable to send the session cookie – the problem there is that we’re outputting text to the screen just to see how it works. There are four important things to note in that example:

  1. You can, if you want, ignore the parameters passed into sess_open() . We’re going to be using a database to store our session data, so we do not need the values at all.
  2. Writing data comes just once, even though our two writes to the session are nonsequential – there is a “print” statement in-between them.
  3. Reading data is done just once, and passes the session ID in.
  4. All the functions return true except sess_read() .


Item 1 there does not stand true if you actually care about where the user asks you to save files. If you are using your own session file system, you might want to actually use $sess_path when it gets passed in – this is your call.

Items 2 and 3 are very important, however, as they show that PHP only does its session reading and writing once. When it writes it gives you the session ID to write and the whole contents of that session, and when it reads it just gives you the session ID to read and expects you to return the whole session data value.
The last item shows that sess_read() is the one function that needs to return a meaningful value to PHP – all the others just need to return true, but reading data from a session needs to either return the data or return an empty string, .
Author’s Note: If you return true or false from your session read function, it is likely that PHP will crash – always return either the session string or an empty string.
Once you have tried running the easy script and you have grasped how it works, it is time to move on to a real working example. What we’re going to do is use MySQL as our database system for session data using the same functions as those above – in essence we’re going to modify the script so that it actually works.
First up, we need to create a table to handle the session data, and here’s how it will look:

CREATE TABLE sessions (

ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

SessionID CHAR(26),

Data TEXT DEFAULT '',

DateTouched INT

);


The ID field is not really required as it is not likely we will ever need to manipulate the database by hand. Having said that, it is better to have it and not need it than need it when we do not have it!

Now, before you try this next code, you need to tweak two values in your php.ini file: session.gc_probability and session.gc_maxlifetime. The first one, in tandem with session.gc_divisor, sets how likely it is for PHP to trigger session clean up with each page request. By default, session.gc_probability is 1 and session.gc_divisor is 1000, which means it will execute session clean up once in every 1000 scripts. As we’re going to be testing our script out, you will need to change session.gc_probability to 1000, giving us a 1000/1000 chance of executing the garbage collection routine – in other words, it will always run.
The second change to make is to make session.gc_maxlifetime a little lower. By default it is 1440 seconds (24 minutes), which is far too long to wait to see if our garbage collection routine works. Set this value to 20, meaning that when running our garbage collection script, we should consider everything over 20 seconds old to be unused and deletable. Of course in production scripts, this value needs to be back to 1440 so that people do not get their sessions timing out before they can even read a simple web page!
With that in mind, here’s the new script:

// php code starts here

mysql_connect("localhost", "phpuser", "alm65z");
mysql_select_db("phpdb");
function sess_open($sess_path, $sess_name) {
return true;
}
function sess_close() {
return true;
}
function sess_read($sess_id) {
$result = mysql_query("SELECT Data FROM sessions WHERE SessionID = '$sess_id';");
if (!mysql_num_rows($result)) {
$CurrentTime = time();
mysql_query("INSERT INTO sessions (SessionID, DateTouched) VALUES ('$sess_id', $CurrentTime);");
return '';
} else {
$CurrentTime = time();
extract(mysql_fetch_array($result), EXTR_PREFIX_ALL, 'sess');
mysql_query("UPDATE sessions SET DateTouched = $CurrentTime WHERE SessionID = '$sess_id';");
return $sess_Data;
}
}
function sess_write($sess_id, $data) {
$CurrentTime = time();
mysql_query("UPDATE sessions SET Data = '$data', DateTouched = $CurrentTime WHERE SessionID = '$sess_id';");
return true;
}
function sess_destroy($sess_id) {
mysql_query("DELETE FROM sessions WHERE SessionID = '$sess_id';");
return true;
}
function sess_gc($sess_maxlifetime) {
$CurrentTime = time();
mysql_query("DELETE FROM sessions WHERE DateTouched + $sess_maxlifetime < $CurrentTime;");
return true;
}
session_set_save_handler("sess_open", "sess_close", "sess_read", "sess_write", "sess_destroy", "sess_gc");
session_start();
$_SESSION['foo'] = "bar";
$_SESSION['baz'] = "wombat";

// php code ends here

It should be immediately apparent that that script is the same thing as before with only the function contents changed, and the function contents aren’t exactly anything special! As the script starts up it forms a connection to the local SQL server – this is used through the script for the session-handling functions. When a session is read, sess_read() is called and given the session ID to read. This is used to query our sessions table – if the ID exists, its value is returned back. If not, an empty session row is created with that session ID and an empty string is returned. The empty row is put in there so that we can later just say “UPDATE” while writing and do not need to bother whether the row exists already as we know we created it when reading. The sess_write() function, then, is again fairly straightforward – update the session with ID $sess_id so that it holds the data passed in with $data.


The last function of interest is sess_gc() , which is called randomly to handle deletion of old session information. Of course, we edited php.ini so that “randomly” means “every time” right now, and this function receives the lifespan in seconds of session data, and deletes all rows that have not been read or updated in that time. We can tell how long it has been since a row was last read/written because both sess_read() and sess_write() update the DateTouched field to the current time. Therefore to tell whether a record was not touched after the garbage collection time limit we simply take DateTouched and add the time limit $sess_maxlifetime to it – if that value is under the current time, the session data is no longer valid.
It is interesting to note that you need not use databases or files to store your sessions. As we’ve seen, you get to define the storage and retrieval method for your system – if you really wanted, you could write your own extension called PigeonStore that sends and retrieves session data through pigeons. It really doesn’t matter, because PHP just calls the functions you tell it to; what you do in there is down to you, so use it wisely.
Some people hold the opinion that it is a good idea to use the redirectable session backend to write session handlers that use SQLite, however I disagree. PHP’s session functions are file-based by default and are pretty fast too. As we’ve seen, PHP always reads in the whole session file and writes it out whole too, which means PHP only has to read the file and write the file, never the kinds of searches and partial edits that a database would excel at. If you want to try it out, go ahead – you might find it is a smidge faster in some circumstances, and doing so is a great way to help you learn.

No comments: