23 November 1999. Daniel Hellerstein (danieh@crosslink.net) SRE-http and delta encoding. Abstract: SRE-http ver 1.3h provides limited support for the proposed delta encoding protocol. -------------------------------------------- Contents: 1) Introduction 2) Configuring SRE-http's support for delta encoding 3) On whether to enable delta-encoding on a selector specific basis: 4) Technical details 5) Various notes -------------------------------------------- 1) Introduction One of the most significant advances of the http/1.1 standard is increased support for caching. Caching is the act of using a a locally available version of a resource instead of re-obtaining the resource from its originating server. To the extent that caching can be enhanced, overall internet traffic will be reduced, with concomittant increase in delivery speeds for what remains. Unfortunately, the improved (but still relatively simple) caching schemes supported by http/1.1 are not well suited to dynamic content. Since a large (and probably growing) share of web resources are dynamic (that is, they change from day to day), this weakness may seriouslly undermine the potential advantages of caching. One strategy of dealing with this problem is through the use of "deltas". Current cache schemes require a server to instruct a client to either use a cached item as is, or not at all. Delta caching schemes represent a compromise -- the server can tell a client to use its cached version as a "base", and send a list of "differences". This list of differences, which we also refer to as "deltas", may often be much smaller then the full contents of the resource. In recognition of this possiblity, an http working group has drafted a set of http standards for the use of "deltas" (see http://info.internet.isi.edu:80/in-drafts/files/draft-mogul-http-delta-02.txt) SRE-http implements much of this proposed standard. -------------------------------------------- 2) Configuring SRE-http's support for delta encoding In order to enable SRE-http's support for delta encoding, you'll need the following: a) You must install the (free) GNU DIFF software, and the (free) rxGDIFF dynamic link library. GNU DIFF can be downloaded from Hobbes (http://hobbes.nmsu.edu); and GDIFF (which is NOT the same thing as Gnu DIFF) can be obtained from http://www.srehttp.org/apps/gdiff/). or, you can catch them all at ... http://www.srehttp.org/pubfiles/srediff.zip. After downloading srediff.zip, unzip it to your GoServe directory and look at the DIFF_ME.TXT file for information on conditions of use. Note that DIFF requires emx 0.9d. EMX can be found at http://hobbes.nmsu.edu (search for EMXRT). b) Make sure that your SRE-http "TEMPDATA_DIR" directory has a DELTAS\ subdirectory (if you used SRE-http's install program, that will have been automatically created). This DELTAS_DIR directory MUST be on an HPFS drive. If it is not, delta-encoding will NOT be enabled. c) Set a few parameters in INIT_STA.80 Two parameters must be set: delta_encoding_enable delta_encoding_enable can take values of: 0 = disable 1 = allow sel-specific support for delta encoding 2 = allow delta encoding for all requests Examples: delta_encoding_enable=0 delta_encoding_enable=2 delta_encoding_maxsize delta_encoding_maxsize is the size (in Kbyte) of your delta-encoding cache. SRE-http will monitor the size of the DELTAS_DIR directory, and will remove "least recently used" files when this size is exceeded. Example: delta_encoding_maxsize=5000 d) Possibly set some selector specific advanced options. If you've set delta_encoding_enable=1, then delta-encoding is only attempted when a selector has a "delta-encoding" advanced option. For example, you can create a selector specific advanced options file that contains the line SET DELTA_ENCODING 1 (where 1 means enable, and 0 means disable). Please see ADV_OPTS.DOC for details on how to set advanced options. e) You may also wish to set the DCluster and DTemplate advanced options. These options extend the capabilities of delta caching to families of resources, and can greatly increase the effectiveness of delta caching. See ADV_OPTS.DOC, and the delta differencing proposed standard, for further details on these extensions to delta encoding. Once you've done these steps, SRE-http will automatically handle requests from delta-encoding aware clients. -------------------------------------------- 3) On whether to enable delta-encoding on a selector specific basis Delta encoding is a grand idea, but it comes at some cost to your server -- the time required to compute differences, and the space (and time) required to store "prior instances". In many cases, it's not worth it -- such as for resource that are rarely re-requested, for resources that rarely change, or for small resources that contain many possible changes. Thus, we recommend that the "selector specific" version of delta-encoding be used (delta_encoding_enable=1). This does require more work on the server administrator's part, though you can use wildcarded selector specific advanced options to refer to broad sets of resources. Conversely, the "allow for all requests" mode means delta encoding is attempted for all resources; which will probably lead to an overall reduction in throughput speeds (depending on how fast your server is, how slow the lines to your clients are, and how extensively delta-encoding is understood). If you want to allow delta-encoding for most of your web resources, you could set delta_encoding_enable=0, and then use SET DELTA_ENCODING 0 parameters for all the resources that should NOT be subject to delta encoding (say, all your .ZIP files). -------------------------------------------- 4) Technical details Although not necessary for its use, the reader may be interested in some of the details of SRE-http's support for delta encoding. We strongly suggest reading the proposed standard -- it's generally well-written, especially if you don't try to grok all the details. First off, let's define "instances". * An instance is the state of a web resource at a given instant. * When asked for a resource, the server return's a "current instance". * A client may have one or several prior instances of a resource (say, from earlier requests for the same URI), * A server may also retain copies of some of these prior instances. If the server and client can ascertain that both have the same copy of a prior instance, then the server can compute the difference between this "commonly owned" prior instance and the current instance. One can think of this difference as being an "encoding" of the current instance, in much the same way that GZIP compression is an encodng Upon reciept of this "difference", the client can create a duplicate of the new version by combining the "difference" with the "old version". Delta-encoding is supported either as a "content-encoding" or a "transfer-encoding" (but NEVER as both). For practical purposes, the differences mostly effect how "range" requests are dealt with. A "range request with delta content-encoding" will result in the requested range of the "delta" being returned. That is, the server first computes a "difference" between the current and "commonly owned" prior instance, and the range of this difference is returned. A "range request with delta transfer-encoding" will result in a difference between the corresponding ranges of the two instances. That is, first the desired range of the current instance is extracted, then the same range is extracted from the prior instance, and then a difference is computed between these two ranges. Due to the complexities of the latter operation (of extracting ranges), delta transfer encoding is not supported when multiple byte ranges are desired. However, due to complexities of dealing with content encodings that are not dynamically generated, delta content encoding is never used when an explicit content encoding is desired. In particular, if content negotiation (as described in NEGOTIAT.DOC) is used to determine a content encoding, then delta content encoding is not attempted. -------------------------------------------- 5) Various Notes * For several reasons, the delta encoding proposal recommends that delta content encoding be used in preference to delta transfer encoding. However, due to implementation hassles, in SRE-http transfer encoding is somewhat better supported. In particular, you can combine GZIP and delta encoding when they are used as transfer encodings, but not when they are to be used as content encodings. * Currently, SRE-http supports the GDIFF and DIFF-E "delta encoding types". DIFF software is readily available, and free; a free OS/2 version of GDIFF can be found at http://www.srehttp.org/apps/gdiff. DIFF does NOT work with binary files, but tends to be better then GDIFF for non-binary (text) files. The delta encoding spec mentions VCDIFF. If OS/2 implementations of these (or other common) differencing algorithims become available, we will be happy to add support (code donations are gratefully accepted!) * To avoid some odd definitional problems, and implementation hassles, SRE-http places the following restrictions on gzip and delta-encoding: Given: i) gzip and a delta-coding-type (such as vcdiff or diff-e) appear in a TE request header ii) gzip precedes the delta-content-encoding. For example; Accept-Encoding: gzip, diff-e iii) a successful (smaller sized) delta-encoding can be computed between the current instance and a base instance the client owns (as referred to by the etag in a If-None-Match: etag request header) Then GZIP will not be used. That is, the delta-encoded response will be returned as is (without compression) Conversely, iia) if the GZIP follows the delta-content-encoding -- for example" Accept-Encoding: diff-e,gzip Then GZIP will be used. That is, the delta-encoded response will be compressed. * If your TEMPDATA_DIR is NOT on an HPFS drive, you can explicitily set the DELTAS_DIR to point to a directory on a different, HPFS, drive. See SREFMON.CMD for the details. * If a selector specific advanced options are used to define a CONTENT-ENCODING response header, then delta content encoding will not be attempted. * If a selector specific advanced options is used to define an ETAG response header, then delta encoding is NOT attempted. * As an alternative to delta encoding, consider the use of the "SRErsync" pre-reply procedure (see SRERSYNC.DOC). * Keep in mind that for dynamic resources (say, a web page that contains a hit counter, or a clock), every single request results in a different instance, so that each request results in the server retaining a copy of the response (for use as a "prior instance" in future requests). Hence the need for the delta_encoding_maxsize parameter! * DoGET.CMD, that comes with SRE-http, contains a "delta encoding" option that makes it (relatively) easy to issue a request with the necessary delta transfer encoding headers. DoGET.CMD can also "unDIFF" and "unGDIFF" a delta-encoded response. * When DELTA_ENCODING_ENABLE=1, SRE-http "addons" will not attempt delta encoding (that is, you can't use a selector specific advanced option to enable delta encoding for "addons"). * For now, delta-encoded responses will be set to be non-cachable.