14 September 1999 SRE-http and Content Negotiation Content negotiation refers to the choosing of a best representation of a web resource from several alternatives. SRE-http supports several http/1.1 compliant mechanisms for performing content negotiation. Contents: I) Introduction II) Specifying a negotiable resource II.1) Creating variants II.2) Creating a Variant List II.3) Identifying a negotiable resource. II.3.a) Wildcarded form II.4) Wildcarded variants III) The content negotiation algorithim. III.a) Using your own server-side content negotiation algorithims III.a.i) The Custom Procedure III.a.ii) Special procnames Appendix A) Sample Variants file. Appendix B) Specifying a Remote Variant Selection Algorithim Appendix C) Example of a TCN Request and Response Appendix D) The SREF_NEGOTIATE_ procedures. --------------------- 1) Introduction Content negotiation refers to the choosing of a best representation of a web resource from several alternatives. In general, content negotiation is used to choose between resources created with one of several languages, or one of several mimetypes. Content negotiation can also be used to choose between resources using alternate character sets, and to choose shorter documents. In the future, content negotiation may also be used to choose documents using certain features (such as documents using different versions of html). There are basically two forms of content negotiation: 1)Server-side. Server-side content negotiation (which is defined in http/1.0) is performed by the server -- the server uses ACCEPT request headers (provided by the client) to automatically choose and return one of serveral variants. 2)Client-side. Client-side content negotiation (which is new to http/1.1) is accomplished by the client automatically requesting one of several variants: the choice is based on a "variants list" (that contains URI's and descriptive information) contained in response headers returned by the server. SRE-http supports both server-side and client-side content negotiation. --------------------- II) Specifying a negotiable resource To specify a "negotiable" resource (a web resource with several possible variants) requires 3 steps: 1) Create several "variants" of a resource 2) Create a file, in your web-space, containing a "variant list" 3) Create a special NEGOTIATE alias pointing to this file These steps are described in the next three sections. Notes: * SRE-http's implementation of content negotiation is loosely based on Apache 1.3. You might want to examine http://www.apache.org/docs/content-negotiation.html for a different description. * For a technical discussion of content-negotiation, see RFC 2295 "Transparent Content Negotiation in HTTP" at http://gewis.win.tue.nl/~koen/conneg/ * The rationale for client-side content negotiation is that the browser is best equipped to choose a variant (given descriptive information on the variants mimetype, language, etc.) Furthermore, it is thought that it is wasteful of bandwidth to provide the full range of Accept: headers on every request (since most resources will not be subject to content negotiation). However, client-side content negotiation does require two round trips (one to get an alternates list, and a second to get the desired variant). * If all you are interested in is GZIPping a response (as a transfer or content encoding), please see the description of CE_GZIP (in INITFILT.DOC). ------------------------- II.1) Creating variants Basically, a variant is any server resource. This includes documents, images, and even cgi-bin scripts and sre-http addons. The notion is that variants all represent variations of the same information. For example, you may have several translations of the same document (say, an English, Finnish and Korean version) which you'ld like to automatically send to the appropropriate clients. There are a few constraints: 1) variants must be on the same server as the "variant list" (discussed next). In addition SRE-http enforces the "good practice" (from a security standpoint) of requiring that each variant to be in (or under) the same directory as the variant list. 2) variants must be retrievable via GET requests. That is, each variant should be accessible with a standard URL; which also means that content negotiation will NOT work with POST requests. ------------------------- II.2) Creating a Variant List The variant list is at the heart of SRE-http's implementation of content negotiation. The variant list is a simple (text) file containing several multi-line records. This file should be accessible from the web. That is: the variant list must be in (or under) the GoServe data directory, or in an SRE-http virtual directory. Each record in the variant lists must specify a URI (a selector), and several pieces of identifying information. This identifying information is used to specify up to six "dimensions of negotiation": such as the mimetype, charset, language, encoding, features, and length. The syntax of these multi-line records is (note that a wildcarded form of these records can also be specified, as discussed in section II.4 below): URI: a_selector Content-type: type/subtype ; charset=a_charset ; qs=m.mm Content-language: l1, l2 Content-encoding: enctype Content-length: nnnn Option: an option Where: URI is required. URI should be a valid selector; that is, site information can not be included -- the resource must be on the same site as the variant list. Note that relative URI's are interepreted relative to the location of the "variant list". ** Since variants (as identified by these URI entries) MUST be in (or under) the directory containing the variant list file, use of "relative" uri's is recommended. Content-type: Content-type is required. It contains 3 sub-fields. type/subtype: Only the type/subtype subfield is required. It identifies the mime-type of this variant. charset: Optional. Identifies the character set. If not specified, a ISO-8859-1 (latin1) is assumed qs: Optional. The "selection quality". Must be between 0.0 and 1.0. 0.0 means "unacceptable", 1.0 means "perfect representation". If not specified, a value of 1.0 is assumed. All else equal, variants with higher values are preferentially chosen. Content-language: Optional. A comma delimited list of 2 character languages codes (i.e.; EN for English, FR for French, DE for German). Note that the 2-2 letter codes, such as En-US, are shortened (only the first two characters are used). Content-encoding: Optional. A comma delimited list of content encoding types (i.e.; identity, gzip, and compress). Note: if a variant with GZIP content-encoding is chosen, then on-the-fly GZIP content encoding (as controlled by the CE_GZIP parameter in INIT_STA.80) will be suppressed. Content-length: Optional. The length of the resource. If not specified, a length of 0 is assumed. Note that "longer" resources are less likely to be chosen (all else equal). Description: Optional An optional description of this variant. Features: Optional Features allows you to note special "features" of the variant. Although not widely supported, in the future browsers may use such information to choose a variant (SRE-http does NOT use features in its default variant selection algorithim). For details on "feature", please see RFC2295. Option: Optional You can specify "selector specific" advanced options to be assign to this variant. These complement selector specfic advanced options that are assigned to the original selector (that were assigned to the selector that points to the variant list). You can have multiple Option: lines in a given entry. Note that Option: entries are NOT used by SRE-http's default variant selection algorithim. Notes: * The "qs" quality specified in content-type applies to the entire variant. You should NOT enter "quality" terms for the other content- factors. * Appendix A contains an example of a variant list file. * When a variant is selected, its Content-Type, Content-Language, and Content-Encoding are returned as response headers. Thus, in most cases the Content-Type specified in a variant list will override the "default" Content-Type (i.e.; the mimetype derived from the file's extension). The exception to this rule is when a mime-type is specified as an advanced option, or when the variant is a CGI script or an SRE-http addon (in which case the script/addon should provide the content-type information). * If the last variant in a variant only has a URI: field, then it is treated as a "fall back" variant, and will (typically) be used only if there is no best match. ------------------------- II.3) Identifying a negotiable resource. SRE-http uses a special "alias" to identify variant lists. Unless an alias is used to explicitily identify a variant list, a request for a variant list will be treated in the normal fashion. That is, the variant list file would be returned verbatim (say, as a text/plain response). To identify a variant list, you can either: a) add an entry to to your ALIASES file, using: sel !NEGOTIATE target b) define a realm in ATTRIBS.CFG using: rule: sel redirect: NEGOTIATE= target where: sel = a selector that points to identified as a negotiable resource. Alternatively, you can use a wildcarded selector. target = optional -- either a fully qualified file name or a selector. Typically, target is not specified; in which case SRE-http use its normal rules to map sel to a file containing a variant list (that is, sel is assumed to be under the GoServe data directory or under a virtual directory). If specified, target is used as the variant list. If target is not a fully qualified file name, it should be relative to the GoServe data directory (or to a virtual directory). In all cases, the variants must be "neighboring", which means they MUST be relative to the directory that contains the variant list. ** If you use a wildcarded selector (containing *), you MUST specify ** a target -- see the section II.3.a for the details. Some examples: ALIASES.IN example: /VARTEST/VAR1.LST !NEGOTIATE ATTRIBS.CFG example: realm: negot1 rule: /VARTEST/VAR1.LST redirect: Negotiate= Assuming that your GoServe data directory is E:\WWW, this could mean: /VARTEST/VAR1.LST is a negotiable resource, with a variant list defined in E:\WWW\VARTEST\VAR1.LST. Notes: * As noted, "resources" pointed to by a variant list must be "neighboring"; they must be in or under the same directory as the variant list file. * Unlike most SRE-http aliases, there is no explicit "replacement". Actually, you can think of the variant list itself as an extension of SRE-http's aliasing -- it contains instructions used to decide which (of several) replacements to use. II.3.a) Wildcarded form When sel contains a wildcard (an *), you MUST use target to specify the file containing the variant list (you can use either a fully qualified file name, or another selector). For example, using ALIASES.IN: /MANUALS/*.HTM !NEGOTIATE /MANUALS/DOCS.LST or, using ATTRIB.CFG Realm: negot2 Rule: /MANUALS/*.HTM Redirect: Negotiate=/MANUALS/DOCS.LST In this case, all request selectors that match MANUALS/*.HTM (note that the leading / is ignored) will use the variant list specified in /MANUALS/DOCS.LST. Please see the next section for details on how variants are resolved when this wild_sel form is used. Once you've accomplished these steps, all you need to do is put a URL pointing to sel (or that will wildcard-match sel); and hope that your client's browser either provides useful ACCEPT headers, or knows how to do client side content negotiation. ------------------------- II.4) Wildcarded variants As outlined above, one creates a unique variant list (and a unique alias) for all negotiable resources. This may become quite tedious, especially when you have multiple sets of documents. For example, if you have a 10 chapter manual in 3 languages (hence, 30 files), it could be advantageous (that is, a lot less trouble) to use some wildcarded "variant list" for all 10 chapters. In recognition of this possibility, SRE-http supports a special form of variant list that supports such "multiple sets of negotiable resources". The specification of these sets requires two changes to the simple case. The first difference is discussed above -- the use of the "wildcarded form" of sel. The second involves modifications to the variant list file. Recollect that the wildcarded form directs many possible "request selectors" to a single variant list. Thus, the variant list should contain information that allows the request selector to influence the value of the URI: field of each record in the variant list. To do this, two steps are required. a) A (case insensitive) PATTERN: wild_sel entry should be put at the top of the variant list. For example: Pattern: /manuals/*.HTM -- (note that a leading / is ignored) ** The value of "wild_sel" used in a PATTERN: entry should be the same as the "wild_sel" used as the "target" portion of the wildcarded alias (that is used to identify the variant list). b) The URI: fields may contain * characters. SRE-http will replace these * (in the URI: entries) with corresponding portions of the request selector. For example: i) the request selector is: manual/chap1.htm ii) the alias is: manual/* !negotiate manual/docs.lst iii) manual/docs.lst contains pattern: manual/* URI: de/* Content-type: text/html Content-Language: de Uri: en/* Content-type: text/html Content-language: en then a) If the request contain an accept-language: en request header, then en/chap1.htm would be used b) If the request contain an accept-language: de request header, then de/chap1.htm would be used That is, the * in the Pattern: (in manual/*) "corresponds to" chap1.htm, which is then used as a substitute for the * in the various URI: entries. ------------------------- III) The content negotiation algorithim. The following sketches the default content negotiation algorithim used by SRE-http. Note that this is used both for "server side negotiation" (when the client does not include a Negotiate: request header), and as a "remove variant selection algorithim" (when the client includes a Negotiate: * request header). Alternatively, you can specify your own "server side "content negotiation algorithim on a selector specific basis -- see section III.a for the details! 1) First, SRE-http checks for a "Negotiate:" request header. If no such header exists, then server-side negotiation is always attempted. If this header does exist, then server-side negotiation may be attempted (see the notes for details). 2) If server-side negotiation is to be attempted, by default the following selection algorithim is used. If the client allows a "remote variant selection algorithim" (by including a Negotiate: n.n request header), then a custom procedure can be used instead (see Appendix B for the details). Note that this is a "leave as soon as a definitive answer is found" method -- latter steps are only used if earlier steps yield ties. Furthermore, variants eliminated in earlier steps are NOT available -- they are NOT considered in latter steps. Lastly, if all variants are eliminated, a suitable "could not find representation" response is immediately returned. a) Accept: headers are read. Accept: headers contains information on acceptable mime-types. This information can contain "selection quality" (q) information. The variant with the best "combined" quality is used. Combined quality is deterimined by multiplying the variant-list qs (quality) by the accept: header "q: factors". If there are ties (i.e.; several mimetypes have a combined quality of 1.0), then move to step b. If there is no accept: header (most browsers send some form of Accept: header) then skip this step. b) Accept-language: headers are read. These request headers can contain "language specific q modifiers". The variant (of those surviving step a) with the highest language "q" factor is used. If there are ties, move to step c. If there is no accept-language header, this step is skipped. Note that the content-language entries in the variant list should NOT include "q" factors. c) Accept-Encoding: headers are read. These headers can also contain "encoding specific q modifiers". The variant (of those surviving step b) with the highest encoding "q" factor is used. If there are ties, move to step d. If there is no accept-encoding header, this step is skipped. Note that the content-encoding entries in the variant list should NOT include "q" factors. d) Accept-charset: headers are read. Variants that do not match this charset are dropped. If there are ties then move to step e. If there is no accept-charset header then skip this step. e) Use the variant with the smallest content-length (as pulled from the variant list). If there are ties, move to step f. f) Use the first of the remaining variants. Note: If all variants are removed at any step in this process (say, no variants have a content-language, and an explicit Accept-language was specified), then the algorithim exits. In this case, if a default variant was specified, it will be used. Otherwise, either a 406 (for http/1.1 clients) or a 404 (for http/1.0 clients) response is returned. 3) If client-side negotiation is to be used, SRE-http returns a special "300" return code. The body of the response contains an