Content Negotiation – Mirrored Post

As mentioned in the last post, there is an excellent article available at http://www.autisticcuckoo.net/archive.php?id=2004/ 11/03/ content-negotiation, but sadly the author of this article has expressed his disinterest in continuing with his blog. While it is possible that he will continue to pay his hosting fees and continue to re-register the domain name, this is not certain so, to try and at least retain this article we have copied it (verbatim) below.

Original Source – http://www.autisticcuckoo.net/archive.php?id=2004/11/03/content-negotiation

We have, for some time, tried to inform people about the fact that there is no point whatsoever in using XHTML as long as you serve the documents with a text/html media type. For those who still want to use XHTML and gain at least something for some

users, we have recommended content negotiation. On several occasions people have asked us to publish a write-up on how to do that, but there hasn’t been time to sit down and write it. Now, finally, we have tried to whip something together that we hope can serve as a guide.

What Is Content Negotiation?

Content negotiation means that the server in one way or another
negotiates with a user agent (browser, search engine, etc) that requests a document. The negotiation means that the user agent announces which media types (also called content type or MIME type) it can handle and, optionally, which one it prefers. The server then serves the document in the way that best suits the user agent.

The user agent announces which media types it can handle through a header in the HTTP request it sends to the server. The header is called Accept and can look something like this:

Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, image/jpeg, image/gif;q=0.2, */*;q=0.1

The example is what our instance of Mozilla sends. (We have inserted blanks between the media types so that the text will wrap.) Our interest now lies with application/xhtml+xml and

text/html;q=0.9. The part after the semi-colon, q=0.9, is called a quality value and is a value between 0 and 1, inclusive, with up to three decimal places. The higher the quality value, the more the user agent prefers that media type. If no quality value is specified for a particular media type, it means q=1.0. The example thus shows that Mozilla prefers application/xhtml+xml to text/html.

The usual meaning of content negotiation is that the HTTP server itself decides which media type the user agent prefers, and then automatically chooses between a number of different documents. Normally the file suffix is used to associate to different media types, so the server might choose between

index.xhtml and index.html.

This article describes another type of content negotiation; one that is performed through a server-side script. Most web hosts offer some kind of server-side scripting, usually PHP or ASP. Our example uses PHP, since it is available for more platforms and is open source, while ASP is Microsoft-specific. We don’t delve into the finer details here, but presume that you are sufficiently familiar with PHP.

To round off this explanation of what content negotiation means, we want to emphasise that it’s not merely an issue of deciding which media type to send. When you have chosen a media type, you should also serve the document with a content that corresponds to the chosen media type. You either serve XHTML as application/xhtml+xml, or you serve HTML as text/html.

About the Examples

The code samples in this article are written for PHP 4.1.0 or higher. For older versions you need to replace $_SERVER with $HTTP_SERVER_VARS. If the code is executed in a function, you then need to declare the array as a global (global $HTTP_SERVER_VARS;).

This article presumes that the document’s content is marked up as XHTML 1.1, and that it doesn’t contain anything that cannot be converted into HTML 4.01 Strict, for instance element from other XML namespaces, or CDATA sections.

Parsing the Accept Header

First of all we need to find out whether or not the user agent supports the application/xhtml+xml media type and, if so, whether it prefers that to text/html.

  1. $xhtml = false;
  2. if (preg_match('/application\/xhtml\+xml(;q=(\d+\.\d+))?/i', $_SERVER['HTTP_ACCEPT'], $matches)) {
  3. $xhtmlQ = isset($matches[2]) ? $matches[2] : 1;
  4. if (preg_match('/text\/html(;q=(\d+\.\d+))?/i', $_SERVER['HTTP_ACCEPT'], $matches)) {
  5. $htmlQ = isset($matches[2]) ? $matches[2] : 1;
  6. $xhtml = ($xhtmlQ >= $htmlQ);
  7. } else {
  8. $xhtml = true;
  9. }
  10. }

The $xhtml variable indicates whether or not we will serve the document as XHTML. The initial value is false, since many older browsers lack support for XHTML.

On line 2 we check whether the Accept header contains
application/xhtml+xml plus an optional quality value. This regular expression isn’t 100% fool-proof, since it doesn’t limit the value range to [0,1], nor does it limit the number of decimal places to 3. For all intents and purposes, however, it doesn’t matter.

On line 3 we extract the quality value, if present. If not, we set the quality value for application/xhtml+xml to 1.

On lines 4 and 5 we perform the corresponding check for text/html. Line 6 compares the quality values and sets $xhtml=true if the user agent prefers application/xhtml+xml to text/html. Line 8 handles the case of a user agent that specifies application/xhtml+xml in the

Accept header, but not text/html.

After these lines of code we thus have a Boolean variable, $xhtml, which indicates whether the document will be served as XHTML.

Prepare HTML Conversion

If the user agent doesn’t support XHTML, or if it prefers HTML, we have to convert the document’s content from XHTML 1.1 to HTML 4.01. We do this with a simple function:

  1. function xml2html($buffer)
  2. {
  3. $xml = array('/>', 'xml:lang=');
  4. $html = array('>', 'lang=');
  5. return str_replace($xml, $html, $buffer);
  6. }

Lines 3 and 4 declare two arrays, where the elements in the $xml array will be replaced by the corresponding element in the $html array.

On line 5 each occurrence of /> is replaced by > in the $buffer string. At the same time, each occurrence of xml:lang is replaced by lang.

And Finally…

Only a few details now remain. If the $xhtml variable is true, we need to write the document type declaration for XHTML 1.1 and a <html> element with the proper XML namespace. Most likely we also want to start with an XML

declaration, and link to our style sheets through processing instructions.

If the user agent doesn’t want XHTML, we need to write a document type declaration for HTML 4.01 Strict and a <html> element without an XML namespace. Style sheets should be linked through ordinary <link> elements (or be imported in a <style> element). Furthermore, we need to instruct the PHP interpreter to buffer all output to the response stream, and to call our conversion function on the result before sending it back to the user agent.

Before we write anything at all, however, we must send a couple of HTTP headers: one that says which media type we use, and one that informs proxy servers that content negotiation has taken place so that they can consider that in their caching algorithms.

  1. if ($xhtml) {
  2. header('Content-Type: application/xhtml+xml; charset=utf-8');
  3. header('Vary: Accept');
  4. echo '<?xml version="1.0" encoding="utf-8"?>', "\n";
  5. echo '<?xml-stylesheet type="text/css" xhref="/css/screen.css" media="screen"?>', "\n";
  6. echo '<?xml-stylesheet type="text/css" xhref="/css/print.css" media="print"?>', "\n";
  7. echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">', "\n";
  8. echo '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">', "\n";
  9. } else {
  10. header('Content-Type: text/html; charset=utf-8');
  11. header('Vary: Accept');
  12. ob_start('xml2html');
  13. echo '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">', "\n";
  14. echo '<html lang="en">', "\n";
  15. }

Don’t forget to link to the style sheets in the <head> if the document is served as HTML.

There is a blatant shortcoming in the example shown in this article: the W3C validator. It doesn’t send application/xhtml+xml in its Accept

header, so it’s impossible to validate the document as XHTML. It is trivial to let a query parameter control the choice of media type, but that is left as an exercise for the reader.

(note: We are aware of some possible copyright issues, and we have attempted to contact the original owner to get permission to repost it verbatim here. At the time of this post, no replies had been received and we can only assume the original source is no longer on line. If you are the original source and would like this post removed please contact us and we will take this post down immediately)

This entry was posted in Uncategorized by Site Admin. Bookmark the permalink.

About Site Admin

Website administrator for the WhyDontYou domain. Have maintained and developled a variety of sites, ranging from simple, plain HTML sites to full blown e-commerce applications. Interested in philosophy, politics and science.