The SRE2002 http server: IM Modules

5 March 2002

Creating and Using SRE2002 IM Modules

Contents:

Introduction
Installing an IM module
Working with IM modules
Creating IM Modules
Information provided to IM modules
Information provided by IM modules
IM Module Initialization
Information that can be used by IM modules
Example: the XRANGE.RXX eXtended Range IM Module

Introduction

Instance Manipulation (IM), a term first devised in the context of the delta-encoding standard, refers to a content-preserving manipulation of a server's response to a client. Although IM may be done for a number of reasons, the most common are to shrink the size of the message body transmitted by the server -- thereby speeding up response time over limited bandwidth.

Note that what the client eventually ends up with must not depend on what Instance Manipulations have been applied to the content of the server's response; given that the client can properly handle whatever IMs were applied.
A simple example of IM is compression (such as GZIP compression) -- the original content can be readily recovered by the client.

The knowledgable reader may wonder how content-encoding and transfer-encoding differ from IM. For example, GZIP encoding can be applied as either a content-encoding or as a transfer-encoding. Hence, why the need for more terminology?

To clarify why IM should be considered as a distinct step in the process of fulfilling an http request, consider the following sequence:

Upon receiving a GET request, the server uses the URI in the request to identify the requested resource.
Optionally, it uses information from the request (and perhaps additional information) to select a variant of that resource.
At this point, the server may apply a non-identity content-coding to the content generated in steps 1 and 2 (or one might have been inherent in its generation). This also results in a Content-Encoding header.
In the context of SRE2002 -- steps 1 to 3 are handled by the filter.
The result of the first three steps, at the time when the request is processed, is an instance. The instance includes a body (possibly empty) and possibly some instance headers. The entity tag, if any, is assigned at this point. That is, an entity tag is associated with an instance, NOT an entity.
In the context of SRE2002 -- the ETAG option of SRE_COMMAND can be used to automatically create an entity tag.
The server may then apply an instance-manipulation. For example, if the request included a Range header, the server may optionally produce a range response, consisting of the original set of headers, a Content-Range header, and the appropriate range(s) from the (possibly encoded) body.
In the context of SRE2002 -- steps 5 is handled by an Instance Manipulation Module
The result of the fifth step becomes the entity, consisting of entity headers and an entity body.
The server may then apply a non-identity transfer-coding; on-the-fly compression could be done in this step. If so, a Transfer-Encoding header is added to the message.
In the context of SRE2002 -- on-the-fly GZIP compression as a transfer-encoding, and computation of Content-MD5 headers, are done in this step.
The results of the seventh step is the message, consisting of a message body (the transfer-coded version of the entity body), the entity headers, and additional response and general headers.

More formally, we define
instance manipulation as: An operation on one or more instances which may result in an instance being conveyed from server to client in parts, in more than one response message, or in a compressed format. For example, a range selection or a delta encoding, or a GZIP compression. Instance manipulations are end-to-end, and often involve the use of a cache at the client.

In some ways, IM is similar to transfer-encoding, in that it does not involve permanently modifying the content. A significant difference is that transfer-encoding is hop by hop -- it is only meant for the next stop in a potentially long chain of actors involved in delivery of the response. IM (like content-encoding) is server-to-client.

If range extraction were the only kind of IM available, it would be unneccessary to carefully define IM as a step within the process of fulfilling an http request. However, as mentioned above there are other kinds of IM (such as delta encoding). Given the possiblities that different kinds of instance manipulation that may become beneficial, SRE2002 implements IM through optional "IM Modules". These modules are "plugged into" SRE2002 at run-time -- hence they do not require any modification to the SRE2002 software. Moreover, they can be unplugged just as easily!

For example, SRE2002 comes with an eXtended Range module that supports a String based range extraction (as well as the standard byte based range extraction).

By default, SRE2002 supports standard range extraction, using a built-in IM module.

In the next section we describe how to install an IM module. It's quite simple.

Installing an IM module

First, you'll need to obtain an IM module. Sometimes this entails creating a few subdirectories, or copying files to specific locations. For example, the delta encoding IM module uses a directory for storing cached versions of prior responses. In general, we recommend installing an IM module's software in (or under) the BIN\IM subdirectory of SRE2002.

Given that the files, etc. comprising the desired IM module are where they should be, installing an IM module to SRE2002 simply requires modifying the IM_TYPES parameter in SRE2002.CFG. You can do this by hand, or with one of the on-line configuration tools.

IM_TYPES Syntax: IM_TYPES= im_name1 im_file1 , im_name2 im_file2 , ....
where each (case insensitive) im_name1 & im_file1 pair are seperated by a comma:

im_nameN = a short name identifying this module.
im_fileN = a filename of the rexx program that implements this module. This should either be a fully qualified filename, or a filename relative to the SRE2002 directory.
Example:
IM_TYPES=xrange bin\im\xrange.rxx , deltas D:\TEST\DELTA\delta.rxx

That's it -- next time you start up SRE2002, the IM modules you specify in IM_TYPES will be available to SRE2002. The next step is to notify SRE2002 that you want to use one of them -- you can do this for all requests, or only for specific responses. That is, you can use one IM module for all requests, excepting requests for which you want to use a different one!

Using an IM module

There are two ways by which an IM module may be used:

You can define an IM module to apply to all requests.
This is done by specifying the IM_DEFAULT configuration parameter,

IM_DEFAULT

Syntax: IM_DEFAULT= im_name

where the (case insensitive) im_name is one of the im_names specified in IM_TYPES,
or one of two special values:

0 = By default, do not attempt an Instance Manipulation.
RANGE = By default, use the SRE2002's built-in range extraction.

Examples:
  IM_DEFAULT=DELTAS
  IM_DEFAULT=RANGE
  IM_DEFAULT=XRANGE
  IM_DEFAULT=0

Alternatively, you can override the default IM type on a request specific basis by using the IM option of SRE_COMMAND.

IM option
of SRE_COMMAND

Syntax: IM im_name

where the (case insensitive) im_name is 0, RANGE, or one of the im_names specified in IM_TYPES,

Reminder: If IM is not specified, the IM type specified in IM_DEFAULT is used

Examples:
    rcode=sre_command('file type text/plain etag_auto IM RANGE name F:\www\hello.txt')
    rcode=sre_command('VAR type text/html IM DELTA name ',STUFF)
    rcode=sre_command('FILE type image/gif IM 0 name D:\Pictures\myhouse.gif')

Thus, the use of the IM option allows you to pick which IM module to apply to specific requests. This gives the ambitious filter author great flexibility.

Warning: if the selected IM module is not available a 501 error response will be returned.

Working with IM modules

This section is meant for filter authors, or for those writing addons or scripts to run under SRE2002.

IM modules use several sources of information when deciding what to do:

The client address and host nickname
The request line (the URI)
The request headers
The content-type
The optional response headers (not including automatically generated response headers, such as the date and server software name)
Extra information provided by the filter (say, by an addon or script running under the filter)

This extra information is provided to IM modules via the IMINFO subcommand of SRE_COMMAND.

IMINFO subcommand
of SRE_COMMAND
Syntax:

vv=sre_command('IMINFO [CLEAR| [PLACE] [READ] ',im_message)
where one (or none) of the options may be used:

CLEAR = remove prior message(s) and replace with this one. If im_message= ' ', then just clear prior messages(s)
PLACE = add this message only if no prior message has been specified
READ = read current set of messages to be sent to IM. If READ is used, the im_message argument is ignored
otherwise (if no options are specified), add this message (on a new line) after existing messages
Examples:
bybbye=sre_command('IMINFO CLEAR') foo=sre_command('IMINFO ','Strings=Section+1 - Section+2') foo=sre_command('IMINFO PLACE ',' Strings=Section+1.5 - Section+1.8')
Note: Each IMINFO invocation generates a new line of information -- yet this new line may contain CRLFs. Thus, it is up to the user of IMINFO to be sure to structure its usage in a manner that their IM module(s) can understand.
We recommend that filter (or addon) authors use a "header" like format, with a varname: varvalue construct. Note that any IM module that is called will get the same extra information.

Creating IM Modules

This section is meant for authors of IM modules

As noted in the introduction, SRE2002 calls the IM module after recieving some contents from the filter. This contents may be encoded (it may have Content-Encoding applied). For example, it may have been compressed with GZIP. Although this may temper what IM should be done, the exact content-coding is more-or-less independent of the kinds of IM you can apply.

After checking some details (such as creating an ETAG, or checking for an If- condition), SRE2002 calls the IM module. The IM module will then:

Do nothing. SRE2002 will return the contents unmodified (though on-the-fly GZIP may be applied as a tranfer-encoding).
Return a short error message. If there is a serious problem encountered whilst trying to to the instance manipulation (say, an impossible byte range is specified), then the IM module can instruct SRE2002 to return a short error response
The contents may be modified (again, in a 100% recoverable fashion), and SRE2002 will then send this instance-manipulated message to the client (a few extra response headers may also be added).

Thus, SRE2002 provides some information to an IM module, and expcects some information back.

Information provided to IM modules

Basically, SRE2002 calls the (possibly request specific) IM module as an external REXX procedure. The following arguments are used:

`len_file`	`0` = fname is a variable `>0` = fname is a filename, with size len_file
`fname`	either contents of a variable, or a file name
`clientaddr`	the clients ip address
`hostnick`	the hostnickname to whom this request is addressed
`uri`	the full request line WITHOUT url encoding
`ctype`	the content-type (if explicitily specified)
`reqheaders`	request headers, one per line, formatted as hdr_name: hdr_value Note that if the client send multiple request headers with the same hdr_name, they will be concatenated onto one line (with commas seperating each new instance of the header).
`respheader`	currently specified response headers. Same format as reqheaders
`extrainfo`	The extra information specified in calls to `SRE_COMMAND('IM ...)` . Each message sent by (multiple calls) to `SRE_COMMAND('IM ...) starts on a new line of extrainfo`

Example: how to parse arg these arguments:
parse arg len_file,fname,clientaddr,hostnick,uri,ctype,reqheaders,respheaders,extrainfo

Information provided by IM modules

The IM module (running as an external REXX procedure) must return it's results as part of it's return argument. These results should have one of the following structures:

DEF opt_respheaders This IM procedure is not applicable, so try the default (RANGE extraction). The opt_respheaders are optional response headers (see below for details on the structure of these response headers).
0 No change -- SRE2002 should use the contents as is.
Note that 0 is stronger then DEF -- it means this IM process was not applicable, and niether is the default (RANGE extraction)
1 respline respheaders empty line modified_contents. The respline should be a legitimate http response line.
For example: HTTP/1.1 206 Partial Content '
The respheaders should consist of zero, one, or more response headers -- these will be added to, replace, or removed from already specified response headers. Note that an empty line signals the end of the response headers.
Immediately following this empty line should be the modified contents (which can be arbitrarily long, and span multiple lines). This modified contents will be sent to the client as is (well, perhaps with on-the-fly GZIP as a transfer-encoding).
Note that each line of the respheaders should be structured like calls to SRE_COMMAND('HEADER ...').
For example, respheaders could look like: X-Mod1: method loose Drop Range: Add Cache-Control: Retain=100 Place Etag: asde315
2 respline errmessage
Send an error message to the client. The respline is used as the response line, and should be a legitimate response line.
For example: HTTP/1.1 416 Requested Range Not Satisfiable
The errmessage can be an arbitrarily long text message (which can include HTML). It will be sent as the body of a short error response (using the respline as a response line) to the client.
Note that no additional response headers can be specified!

IM Module Initialization

When SRE2002 starts, it will load the selected IM modules. Part of this loading is a special "initialization" call to each IM module. The initialization call consists of a procedure call to each IM module using two argument: !INIT, and the filename of the IM module.

Thus, when writing your IM module code, be sure to check for a value of !INIT in the first (the LEN_FILE argument). When you see it, you can then use the 2nd argument (the fname parameter) as the fully-qualified name of this file. Your IM procedure should should perform any first-time initializations it may require (such as checking for the existence of working directories and databases, or launching tasks). Upon successful initialization, your IM procedure must return a 1 -- any other return is interpreted as an error, and will cause SRE2002 to immediately shutdown.

Information that can be used by IM modules

In addition to the information provided in the arguments to the IM procedure, you can use many of the SRE2002 function calls. For example, you can use ....

SRE_REQUEST_INFO to obtain a few additional request specific variables
SRE_VALUE to obtain system and global variables.
The string manipulation procedure (such as SRE_REPLACEWILD and SRE_PACK64)
The time procedures (such as SRE_GMT and SRE_DATESTAMP)
SRE_EXTRACT_USERNAME

NOTICE:

You can not use any SRE2002 procedure that provides request specific values.
For example, do NOT use:

any of the SRE_COMMAND variants,

the SRE_REQFIELD, SRE_CHECK_USERNAME, and SRE_CLIENTNAME function,

the REQ environment of SRE_VALUE

many of the SRE_EXTRACT and SRE_CONTROL variants.

any of the _RESPONSE procedures (such as SRE_ERROR_RESPONSE)

If you use these verboten procedures -- you will hang the thread. To kill this thread, you'll either have to SNIPE the thread, or shutdown SRE2002. Note that the thread is transaction specific, so other transactions should proceed normally.

Example: the XRANGE.RXX eXtended Range IM Module

SRE2002 comes with a simple example of an IM modlle -- the eXtended Range (XRANGE) module. By default, XRANGE bin\im\xrange.rxx is included in the IM_TYPES parameter (although the IM_DEFAULT is RANGE -- which is SRE2002's built-in range extraction IM procedure).

XRANGE is an extension of standard http/1.1 range extraction. It's major difference is that it recognizes a Range: option of:
    strings=string1 - string2
    or, for multiple ranges:strings=string1 - string2, string1a- string2a , ...
as well the traditional
    bytes=n1-n2

Each string should be a URL-encoded case-sensitive string. In particular, it must not contain spaces, hyphens, or characters not permitted in URIs. Instead, use URL-encoding pf hypens, etc. (and use + for spaces).

XRANGE searches for these strings in the contents provided by SRE2002. It will convert these to byte ranges, returning the (inclusive) range of the contents between the first occurences of these two strings (with the second string in a hyphenated pair always assumed to be after the first string of the pair).

Curious programmers are invited to examine XRANGE.RXX (in BIN\IM). Not only does it illustrate how to code an IM module, it contains a few useful procedures for "unpacking" the request and response headers.

`DEF opt_respheaders`	This IM procedure is not applicable, so try the default (RANGE extraction). The opt_respheaders are optional response headers (see below for details on the structure of these response headers).
`0`	No change -- SRE2002 should use the contents as is. Note that `0` is stronger then `DEF` -- it means this IM process was not applicable, and niether is the default (RANGE extraction)
`1 respline respheaders empty line modified_contents.`	The `respline` should be a legitimate http response line. For example: `HTTP/1.1 206 Partial Content` ' The `respheaders` should consist of zero, one, or more response headers -- these will be added to, replace, or removed from already specified response headers. Note that an empty line signals the end of the response headers. Immediately following this empty line should be the `modified contents` (which can be arbitrarily long, and span multiple lines). This modified contents will be sent to the client as is (well, perhaps with on-the-fly GZIP as a transfer-encoding). Note that each line of the `respheaders` should be structured like calls to `SRE_COMMAND('HEADER ...')`. For example, `respheaders could look like: X-Mod1: method loose Drop Range: Add Cache-Control: Retain=100 Place Etag: asde315`
`2 respline errmessage`	Send an error message to the client. The respline is used as the response line, and should be a legitimate response line. For example: `HTTP/1.1 416 Requested Range Not Satisfiable` The `errmessage` can be an arbitrarily long text message (which can include HTML). It will be sent as the body of a short error response (using the respline as a response line) to the client. Note that no additional response headers can be specified!