As mentioned in the last post, there is an excellent article available at http://www.autisticcuckoo.net/archive.php?id=2004/ 11/03/ content-negotiation, but sadly the author of this article has expressed his disinterest in continuing with his blog. While it is possible that he will continue to pay his hosting fees and continue to re-register the domain name, this is not certain so, to try and at least retain this article we have copied it (verbatim) below.
Original Source – http://www.autisticcuckoo.net/archive.php?id=2004/11/03/content-negotiation
We have, for some time, tried to inform people about the fact that there is no point whatsoever in using XHTML as long as you serve the documents with a
text/html
media type. For those who still want to use XHTML and gain at least something for someusers, we have recommended content negotiation. On several occasions people have asked us to publish a write-up on how to do that, but there hasn’t been time to sit down and write it. Now, finally, we have tried to whip something together that we hope can serve as a guide.
What Is Content Negotiation?
Content negotiation means that the server in one way or another
negotiateswith a user agent (browser, search engine, etc) that requests a document. The negotiation means that the user agent announces which media types (also called content type or MIME type) it can handle and, optionally, which one it prefers. The server then serves the document in the way that best suits the user agent.The user agent announces which media types it can handle through a header in the HTTP request it sends to the server. The header is called
Accept
and can look something like this:Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, image/jpeg, image/gif;q=0.2, */*;q=0.1
The example is what our instance of Mozilla sends. (We have inserted blanks between the media types so that the text will wrap.) Our interest now lies with
application/xhtml+xml
and
text/html;q=0.9
. The part after the semi-colon,q=0.9
, is called a quality value and is a value between 0 and 1, inclusive, with up to three decimal places. The higher the quality value, the more the user agent prefers that media type. If no quality value is specified for a particular media type, it meansq=1.0
. The example thus shows that Mozilla prefersapplication/xhtml+xml
totext/html
.The usual meaning of content negotiation is that the HTTP server itself decides which media type the user agent prefers, and then automatically chooses between a number of different documents. Normally the file suffix is used to associate to different media types, so the server might choose between
index.xhtml and index.html.
This article describes another type of content negotiation; one that is performed through a server-side script. Most web hosts offer some kind of server-side scripting, usually PHP or ASP. Our example uses PHP, since it is available for more platforms and is open source, while ASP is Microsoft-specific. We don’t delve into the finer details here, but presume that you are sufficiently familiar with PHP.
To round off this explanation of what content negotiation means, we want to emphasise that it’s not merely an issue of deciding which media type to send. When you have chosen a media type, you should also serve the document with a content that corresponds to the chosen media type. You either serve XHTML as
application/xhtml+xml
, or you serve HTML astext/html
.About the Examples
The code samples in this article are written for PHP 4.1.0 or higher. For older versions you need to replace
$_SERVER
with$HTTP_SERVER_VARS
. If the code is executed in a function, you then need to declare the array as a global (global $HTTP_SERVER_VARS;
).This article presumes that the document’s content is marked up as XHTML 1.1, and that it doesn’t contain anything that cannot be converted into HTML 4.01 Strict, for instance element from other XML namespaces, or
CDATA
sections.Parsing the Accept Header
First of all we need to find out whether or not the user agent supports the
application/xhtml+xml
media type and, if so, whether it prefers that totext/html
.
$xhtml = false;
if (preg_match('/application\/xhtml\+xml(;q=(\d+\.\d+))?/i', $_SERVER['HTTP_ACCEPT'], $matches)) {
$xhtmlQ = isset($matches[2]) ? $matches[2] : 1;
if (preg_match('/text\/html(;q=(\d+\.\d+))?/i', $_SERVER['HTTP_ACCEPT'], $matches)) {
$htmlQ = isset($matches[2]) ? $matches[2] : 1;
$xhtml = ($xhtmlQ >= $htmlQ);
} else {
$xhtml = true;
}
}
The
$xhtml
variable indicates whether or not we will serve the document as XHTML. The initial value isfalse
, since many older browsers lack support for XHTML.On line 2 we check whether the
Accept
header contains
application/xhtml+xml
plus an optional quality value. This regular expression isn’t 100% fool-proof, since it doesn’t limit the value range to [0,1], nor does it limit the number of decimal places to 3. For all intents and purposes, however, it doesn’t matter.On line 3 we extract the quality value, if present. If not, we set the quality value for
application/xhtml+xml
to 1.On lines 4 and 5 we perform the corresponding check for
text/html
. Line 6 compares the quality values and sets$xhtml=true
if the user agent prefersapplication/xhtml+xml
totext/html
. Line 8 handles the case of a user agent that specifiesapplication/xhtml+xml
in the
Accept
header, but nottext/html
.After these lines of code we thus have a Boolean variable,
$xhtml
, which indicates whether the document will be served as XHTML.Prepare HTML Conversion
If the user agent doesn’t support XHTML, or if it prefers HTML, we have to convert the document’s content from XHTML 1.1 to HTML 4.01. We do this with a simple function:
function xml2html($buffer)
{
$xml = array('/>', 'xml:lang=');
$html = array('>', 'lang=');
return str_replace($xml, $html, $buffer);
}
Lines 3 and 4 declare two arrays, where the elements in the
$xml
array will be replaced by the corresponding element in the$html
array.On line 5 each occurrence of
/>
is replaced by>
in the$buffer
string. At the same time, each occurrence ofxml:lang
is replaced bylang
.And Finally…
Only a few details now remain. If the
$xhtml
variable is true, we need to write the document type declaration for XHTML 1.1 and a<html>
element with the proper XML namespace. Most likely we also want to start with an XMLdeclaration, and link to our style sheets through processing instructions.
If the user agent doesn’t want XHTML, we need to write a document type declaration for HTML 4.01 Strict and a
<html>
element without an XML namespace. Style sheets should be linked through ordinary<link>
elements (or be imported in a<style>
element). Furthermore, we need to instruct the PHP interpreter to buffer all output to the response stream, and to call our conversion function on the result before sending it back to the user agent.Before we write anything at all, however, we must send a couple of HTTP headers: one that says which media type we use, and one that informs proxy servers that content negotiation has taken place so that they can consider that in their caching algorithms.
if ($xhtml) {
header('Content-Type: application/xhtml+xml; charset=utf-8');
header('Vary: Accept');
echo '<?xml version="1.0" encoding="utf-8"?>', "\n";
echo '<?xml-stylesheet type="text/css" xhref="/css/screen.css" media="screen"?>', "\n";
echo '<?xml-stylesheet type="text/css" xhref="/css/print.css" media="print"?>', "\n";
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">', "\n";
echo '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">', "\n";
} else {
header('Content-Type: text/html; charset=utf-8');
header('Vary: Accept');
ob_start('xml2html');
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">', "\n";
echo '<html lang="en">', "\n";
}
Don’t forget to link to the style sheets in the
<head>
if the document is served as HTML.There is a blatant shortcoming in the example shown in this article: the W3C validator. It doesn’t send
application/xhtml+xml
in itsAccept
header, so it’s impossible to validate the document as XHTML. It is trivial to let a query parameter control the choice of media type, but that is left as an exercise for the reader.
(note: We are aware of some possible copyright issues, and we have attempted to contact the original owner to get permission to repost it verbatim here. At the time of this post, no replies had been received and we can only assume the original source is no longer on line. If you are the original source and would like this post removed please contact us and we will take this post down immediately)