5 March 2002
Creating and Using SRE2002 IM Modules
Contents:
Introduction
Instance Manipulation (IM), a term first devised in the context of the
delta-encoding standard,
refers to a content-preserving manipulation of a server's response to a client.
Although IM may be done for a number of reasons, the most common are to shrink the size of the message
body transmitted by the server -- thereby speeding up response time over limited bandwidth.
Note that
what the client eventually ends up with must not depend
on what Instance Manipulations have been applied to the content of the server's response;
given that the client can properly handle whatever IMs were applied.
A simple example of IM is compression (such as GZIP compression) -- the original
content can be readily recovered by the client.
The knowledgable reader may wonder how content-encoding and transfer-encoding differ from
IM. For example, GZIP encoding can be applied as either a content-encoding or as a transfer-encoding.
Hence, why the need for more terminology?
To clarify why IM should be considered as
a distinct step in the process of fulfilling an http request,
consider the following sequence:
- Upon receiving a GET request, the server uses the URI in
the request to identify the requested resource.
- Optionally, it uses information from the request (and
perhaps additional information) to select a variant of
that resource.
- At this point, the server may apply a non-identity
content-coding to the content generated in steps 1 and 2 (or one might have been
inherent in its generation). This also results in a
Content-Encoding header.
In the context of SRE2002 -- steps 1 to 3 are handled by the filter.
- The result of the first three steps, at the time when the
request is processed, is an instance. The instance
includes a body (possibly empty) and possibly some
instance headers. The entity tag, if any, is assigned at
this point. That is, an entity tag is associated with an
instance, NOT an entity.
In the context of SRE2002 -- the ETAG option of SRE_COMMAND can be used to automatically
create an entity tag.
- The server may then apply an instance-manipulation. For
example, if the request included a Range header, the
server may optionally produce a range response, consisting
of the original set of headers, a Content-Range header,
and the appropriate range(s) from the (possibly encoded)
body.
In the context of SRE2002 -- steps 5 is handled by an Instance Manipulation Module
- The result of the fifth step becomes the entity,
consisting of entity headers and an entity body.
- The server may then apply a non-identity transfer-coding;
on-the-fly compression could be done in this step. If so,
a Transfer-Encoding header is added to the message.
In the context of SRE2002 -- on-the-fly GZIP compression as a transfer-encoding, and
computation of Content-MD5 headers, are done in this step.
- The results of the seventh step is the message, consisting
of a message body (the transfer-coded version of the
entity body), the entity headers, and additional response
and general headers.
More formally, we define
instance manipulation as: |
An operation on one or more instances which may
result in an instance being conveyed from server to
client in parts, in more than one response
message, or in a compressed format.
For example, a range selection or a delta
encoding, or a GZIP compression.
Instance manipulations are end-to-end, and
often involve the use of a cache at the client. |
In some ways, IM is similar to transfer-encoding, in that it does not involve permanently modifying the content.
A significant difference is that transfer-encoding is hop by hop -- it is only meant for the
next stop in a potentially long chain of actors involved in delivery of the response. IM (like content-encoding)
is server-to-client.
If range extraction were the only kind of IM available, it would be unneccessary to carefully define IM as
a step within the process of fulfilling an http request. However, as mentioned above
there are other kinds of IM (such as delta encoding). Given the possiblities
that different kinds of instance manipulation that may become beneficial, SRE2002 implements IM through optional
"IM Modules". These modules are "plugged into" SRE2002 at run-time -- hence they do not require any
modification to the SRE2002 software. Moreover, they can be unplugged just as easily!
For example, SRE2002 comes with an eXtended Range module that supports a String based
range extraction (as well as the standard byte based range extraction).
By default, SRE2002 supports standard range extraction, using a built-in IM module.
In the next section we describe how to install an IM module. It's quite simple.
Installing an IM module
First, you'll need to obtain an IM module. Sometimes this entails creating a few subdirectories, or copying files
to specific locations. For example, the delta encoding IM module uses a directory for storing cached versions
of prior responses. In general, we recommend installing an IM module's software in (or under) the BIN\IM subdirectory
of SRE2002.
Given that the files, etc. comprising the desired IM module are where they should be, installing an IM module
to SRE2002 simply requires modifying the IM_TYPES parameter in SRE2002.CFG. You can do this by hand, or with one of the
on-line configuration tools.
|
IM_TYPES |
Syntax: IM_TYPES= im_name1 im_file1 , im_name2 im_file2 , ....
where each (case insensitive) im_name1 & im_file1 pair are seperated by a comma:
Example:
IM_TYPES=xrange bin\im\xrange.rxx , deltas D:\TEST\DELTA\delta.rxx
|
That's it -- next time you start up SRE2002, the IM modules you specify in IM_TYPES will be available to
SRE2002. The next step is to notify SRE2002 that you want to use one of them -- you can do this for
all requests, or only for specific responses. That is, you can use one IM module
for all requests, excepting requests for which you want to use a different one!
Using an IM module
There are two ways by which an IM module may be used:
- You can define an IM module to apply to all requests.
This is done by specifying the IM_DEFAULT configuration parameter,
|
IM_DEFAULT
|
Syntax: IM_DEFAULT= im_name
where the (case insensitive) im_name is one of the im_names specified in IM_TYPES,
or one of two special values:
- 0 = By default, do not attempt an Instance Manipulation.
- RANGE = By default, use the SRE2002's built-in range extraction.
Examples: IM_DEFAULT=DELTAS
IM_DEFAULT=RANGE
IM_DEFAULT=XRANGE
IM_DEFAULT=0
|
- Alternatively, you can override the default IM type on a request specific basis by using the
IM option of SRE_COMMAND.
IM option
of SRE_COMMAND
|
Syntax: IM im_name
where the (case insensitive) im_name is 0, RANGE,
or one of the im_names specified in IM_TYPES,
Reminder: If IM is not specified, the IM type specified in IM_DEFAULT is used
Examples:
rcode=sre_command('file type text/plain etag_auto IM RANGE name F:\www\hello.txt')
rcode=sre_command('VAR type text/html IM DELTA name ',STUFF)
rcode=sre_command('FILE type image/gif IM 0 name D:\Pictures\myhouse.gif')
|
Thus, the use of the IM option allows you to pick which IM module to apply to specific requests.
This gives the ambitious filter author great flexibility.
|
Warning:
| if the selected IM module is not available
a 501 error response will be returned.
|
Working with IM modules
This section is meant for filter authors, or for those writing addons or scripts to run under SRE2002.
IM modules use several sources of information when deciding what to do:
- The client address and host nickname
- The request line (the URI)
- The request headers
- The content-type
- The optional response headers (not including automatically generated response headers, such as the date and server
software name)
- Extra information provided by the filter (say, by an addon or script running under the filter)
This extra information is provided to IM modules via the IMINFO subcommand of SRE_COMMAND.
IMINFO subcommand
of SRE_COMMAND
|
Syntax:
vv=sre_command('IMINFO [CLEAR| [PLACE] [READ] ',im_message)
where one (or none) of the options may be used:
Examples:
bybbye=sre_command('IMINFO CLEAR')
foo=sre_command('IMINFO ','Strings=Section+1 - Section+2')
foo=sre_command('IMINFO PLACE ',' Strings=Section+1.5 - Section+1.8')
Note:
Each IMINFO invocation generates a new line of information -- yet this new line
may contain CRLFs. Thus, it is up to the user of IMINFO to be sure to structure
its usage in a manner that their IM module(s) can understand.
We recommend that filter (or addon) authors use a "header" like format,
with a varname: varvalue construct. Note that any IM module that is
called will get the same extra information.
|
Creating IM Modules
This section is meant for authors of IM modules
As noted in the introduction, SRE2002 calls the IM module after recieving some contents
from the filter. This contents may be encoded (it may have Content-Encoding applied). For example,
it may have been compressed with GZIP. Although this may temper what IM should be done, the exact
content-coding is more-or-less independent of the kinds of IM you can apply.
After checking some details (such as creating an ETAG, or checking for an If- condition), SRE2002 calls
the IM module. The IM module will then:
- Do nothing. SRE2002 will return the contents unmodified (though on-the-fly GZIP may be applied as
a tranfer-encoding).
- Return a short error message. If there is a serious problem encountered whilst trying to to the
instance manipulation (say, an impossible byte range is specified), then the IM module can instruct
SRE2002 to return a short error response
- The contents may be modified (again, in a 100% recoverable fashion), and SRE2002 will then send this
instance-manipulated message to the client (a few extra response headers may also be added).
Thus, SRE2002 provides some information to an IM module, and expcects some information back.
Information provided to IM modules
Basically, SRE2002 calls the (possibly request specific) IM module as an external REXX procedure.
The following arguments are used:
| len_file
| 0 = fname is a variable
>0 = fname is a filename, with size len_file
|
| fname | either contents of a variable, or a file name
|
| clientaddr | the clients ip address
|
| hostnick | the hostnickname to whom this request is addressed
|
| uri | the full request line WITHOUT url encoding
|
| ctype | the content-type (if explicitily specified)
|
| reqheaders | request headers, one per line, formatted as hdr_name: hdr_value
Note that if the client send multiple request headers with the same hdr_name, they will be
concatenated onto one line (with commas seperating each new instance of the header).
|
| respheader | currently specified response headers. Same format as reqheaders
|
| extrainfo
| The extra information specified in calls to SRE_COMMAND('IM ...) .
Each message sent by (multiple calls) to SRE_COMMAND('IM ...) starts on a new line of extrainfo
|
Example: how to parse arg these arguments:
parse arg len_file,fname,clientaddr,hostnick,uri,ctype,reqheaders,respheaders,extrainfo
Information provided by IM modules
The IM module (running as an external REXX procedure) must return it's results as part of it's return argument.
These results should have one of the following structures:
DEF opt_respheaders
| This IM procedure is not applicable, so try the default (RANGE extraction).
The opt_respheaders are optional response headers (see below
for details on the structure of these response headers).
|
|
0
| No change -- SRE2002 should use the contents as is.
Note that 0 is stronger then DEF -- it means
this IM process was not applicable, and niether is the default
(RANGE extraction)
|
1
respline
respheaders
empty line
modified_contents.
| The respline should be a legitimate http response line.
For example: HTTP/1.1 206 Partial Content '
The respheaders should consist of zero, one, or more response headers -- these will
be added to, replace, or removed from already specified response headers. Note that an
empty line signals the end of the response headers.
Immediately following this empty line should be the modified contents (which can
be arbitrarily long, and span multiple lines). This modified contents
will be sent to the client as is (well, perhaps with on-the-fly GZIP as a transfer-encoding).
Note that each line of the respheaders should be structured like calls to SRE_COMMAND('HEADER ...').
For example, respheaders could look like:
X-Mod1: method loose
Drop Range:
Add Cache-Control: Retain=100
Place Etag: asde315
2
respline
errmessage
|
Send an error message to the client. The respline is used as the response line, and
should be a legitimate
response line.
For example: HTTP/1.1 416 Requested Range Not Satisfiable
The errmessage can be an arbitrarily long text message (which can include HTML). It will
be sent as the body of a short error response (using the respline as a response line) to the client.
Note that no additional response headers can be specified!
| |
IM Module Initialization
When SRE2002 starts, it will load the selected IM modules. Part of this loading is
a special "initialization" call to each IM module. The initialization call
consists of a procedure call to
each IM module using two argument: !INIT, and the filename of the IM module.
Thus, when writing your IM module code, be sure to check for a value of !INIT in the first
(the LEN_FILE argument). When you see it, you can then use the 2nd argument (the fname
parameter) as the fully-qualified name of this file. Your IM procedure should
should perform any first-time initializations
it may require (such as checking for the existence of working directories and databases, or
launching tasks). Upon successful initialization, your IM procedure must return a
1 -- any other return is interpreted as an error,
and will cause SRE2002 to immediately shutdown.
Information that can be used by IM modules
In addition to the information provided in the arguments to the IM procedure, you can use many of the
SRE2002 function calls. For example, you can use ....
- SRE_REQUEST_INFO to obtain a few additional request specific variables
- SRE_VALUE to obtain system and global variables.
- The string manipulation procedure (such as SRE_REPLACEWILD and SRE_PACK64)
- The time procedures (such as SRE_GMT and SRE_DATESTAMP)
- SRE_EXTRACT_USERNAME
|
NOTICE:
| You can not use any SRE2002 procedure that provides
request specific values.
For example, do NOT use:
If you use these verboten procedures -- you will hang the thread. To kill this thread,
you'll either have to SNIPE the thread, or shutdown SRE2002. Note that the thread
is transaction specific, so other transactions should proceed normally.
|
Example: the XRANGE.RXX eXtended Range IM Module
SRE2002 comes with a simple example of an IM modlle -- the eXtended Range (XRANGE) module.
By default, XRANGE bin\im\xrange.rxx is included in the IM_TYPES parameter (although the IM_DEFAULT
is RANGE -- which is SRE2002's built-in range extraction IM procedure).
XRANGE is an extension of standard http/1.1 range extraction.
It's major difference is that it recognizes a Range: option of:
strings=string1 - string2
or, for multiple ranges:strings=string1 - string2, string1a- string2a , ...
as well the traditional
bytes=n1-n2
Each string should be a URL-encoded case-sensitive string. In particular, it must not
contain spaces, hyphens, or characters not permitted in URIs. Instead, use URL-encoding pf hypens, etc. (and use
+ for spaces).
XRANGE searches for these strings in the contents provided by SRE2002. It will convert these to
byte ranges, returning the (inclusive)
range of the contents between the first occurences of these two strings (with the second string in a
hyphenated pair always assumed to be after the first string of the pair).
Curious programmers are invited to examine XRANGE.RXX (in BIN\IM). Not only does it illustrate how to
code an IM module, it contains a few useful procedures for "unpacking" the request and response headers.