data URI theory and practice

Theory

Data URI'es is an RFC 2397 published in 1998. It's a URL scheme which is used to embed small resources right into the (X)HTML page.

Syntax is quite simple: data:[<mediatype>][;base64],<data>

To see how it works let's take the following code (testcase):

<link rel="stylesheet" type="text/css" href="data:text/css;base64,Ym9keXtiYWNrZ3JvdW5kOmdyZWVuO30=">

Browser supporting data URI will base64-decode the encoded string Ym9keXtiYWNrZ3JvdW5kOmdyZWVuO30= to body{background:green;} and then load this string as if it was a result of an http request to an external file containing this CSS code.

According to the RFC we can embed any small resource into our page, e.g:

So theoretically we could have the same functionality as we have in MHTML - some or all external resources embedded directly in the page.

All data URI advocates say that as most of the browsers have 2 concurrent connections per server (but 6 in total), dataURI mechanism potentially can speed up page load by decreasing the amount of HTTP requests (especially in case of HTTPS where encrypting payload produces quite big overhead). But:

So keeping all this in mind we can't just say that dataURI is the only usable way to improve page load times. But it definetely is the only option when you have a limited access to the server and/or the server is not configured properly, so you can't set Expires header for aggressive caching, you can't set DNS wildcards or CNAME records to get your resources served from different hosts (and therefore leverage the maximum available concurrent connection in browsers) or server doesn't support HTTP caching properly.

Practice

I can only see the following cases where dataURI can be effectively used:

Please don't forget that if a resource is embedded on multiple pages, it's obviously going to be redownloaded as many times as these containing pages are. And if a resource is not dataURI'ed but referenced normally as an external file, it can be cached quite aggressively and requested from the server only once (all popular web-servers already provide good caching support for static files).

However, this is all ideal world where specification don't have flaws and all browsers follow them.

In our world we have the following:

Also I would strongly discourage from dynamically base64-encoding and embedding images in CSS files by some scripting language unless you're well aware of HTTP caching principles.

Let's consider the following composed code from Wikipedia data:URI page:

<?php
function data_url($file, $mime)
{
    $contents = file_get_contents($file);
    $base64 = base64_encode($contents);
    return ('data:' . $mime . ';base64,' . $base64);
}

header('Content-type: text/css');

?>

div.menu {
    background-image:url(<?php echo data_url('menu_background.png','image/png')?>);
}

Unless accompanied with correct HTTP caching algorythm, this CSS file will be downloaded every time the page that references this CSS file is loaded! So every time user accesses the page referencing this CSS file, server will get a request, initiate script parsing, base64-encode the image and send it back to client. So you get rid of one simple request for an image (that in case of being a static file will be perfectly cached) but have one heavy request that will be run every time user requests a page! Not a fair change I think. So again, if you decide to use data URI scheme for your resources, encode and embed them beforehand or implement proper server-side HTTP caching and compressing support.

Note for russian-speaking users: - there's a way to embed images even for IE6/IE7. Though it's rather a proof-of-concept - it doesn't support HTTP caching/compressing, but it works!

comments powered by Disqus