27 June 1999: SRE-http and caching. SRE-http ver 1.3h supports several forms of caching. This document outlines what levels of caching may apply to a request, and what you can do to increase (or decrease) the extent to which caches answer requests. Hint: The appendix discusses how to configure the "cache relevant" response headers used by SRE-http. There are several different sorts of caches that may apply. In decreasing universality these include: 1) Proxy server caches. For purposes of this discussion, a "proxy server" is any intermediate site, somewhere on the web, that may handle a request issued by a client. These sites may store responses, and use these cached responses the next time the same request is recieved. When such a stored response is used, the origin server is typically not contacted (the origin server does not know that the proxy delivered content to a client). ** Perhaps the principal advantage of http/1.1 (over http/1.0) is the ** attention given to making the web proxy-cache friendly. The appendix discusses how to configure the response headers used by SRE-http to "talk to proxy servers". 2) The GoServe cache. The GoServe cache consists of a list that matches selectors (the local portion of a URI) to filenames. When a request for the same selector arrives, GoServe can resolve the request by sending the matched file (and a few http/1.0 response headers). As an option, the GoServe cache can "run the filter anyways", which allows the filter to perform post-filter actions (such as auditing). 3) The SREPROXY cache. SREPROXY is a front-end to SRE-http. SREPROXY maintains a cache that matches selectors to files. These files may be temporary files (say, as generated by adding SSI's to an HTML document). In addition, SREPROXY can resolve a few "dynamic" SSIs (such as the current time), and can do a limited amount of access control. 4) The SSI and !DIR caches. SREFILTR (the main filter) maintains a cache for SSI documents (that contains "partially compiled" server side includes) and a cache for !DIR requests  (that contains directory listing). These are used when a matching selector is recieved. Note that the SSI cache is often times used as a base to which dynamic SSIs are added; where "dynamic SSIs" refers to information that changes on a request specific basis (i.e; the current time, the client's IP address, and output from INTERPRET SSIs). The basic notion behind the use of a cache is to reduce processing requirements and bandwidth demands. Proxy caches are highly effective at both -- when successful, no communication with the origin server is necessary. The GoServe cache does not save bandwidth, but can reduce server load considerably (by skipping the "call the filter to resolve this request" step). SREPROXY is similar -- although it is a filter that has to be called, it's much smaller and faster then the regular (SREFILTR) filter. Lastly, the SSI and !DIR caches can save a lot of processing for SSI-including and directory-listing "processor intensive" resources. Each of these caches has advantages and disadvantages. Proxy Caches: Advantages * Very fast response times * Can completely eliminate load on your server * Helps reduce internet traffic Disadvantages * Should not be used with actively changing, or access controlled, resources * Should not be used when accurate auditing is important GoServe Cache Advantages: * Response times are very fast (compared with SREFILTR) * Minimizes load on your server Disadvantages: * Should not be used with actively changing, or access controlled, resources * Currently, the GoServe cache is http/1.0, but not http/1.1, compliant. SREPROXY cache: Advantages: * Response times are fast * Can reduce load (since SREPROXY is smaller then SREFILTR) * Can be used with changing and access controlled resources * No loss of functionality -- when in doubt, SREFILTR is used Disadvantages: * Introduces another round of processing -- if a request does not match a cached entry, the net result is to diminish response time. * On occasion, a stale response may be returned SSI and !DIR caches: Advantages: * Fully functional -- changes are immediately detected * Greatly reduces processing for a subset of otherwise processing intensive requests. Disadvantages: * On rare occassions, stale requests may be returned It should be stressed that these caches are not mutually exclusive. In fact, a typical scenario would have the three higher caches (proxy servers, GoServe, and SREPROXY) examining a request, which may then be resolved via the use of the SSI (or !DIR) cache. Thus, optimal performance is acheived by using each cache in a complementary fashion. The following discusses some tricks and techniques you can use. In addition, the appendix discusses the "cache relevant" response headers used by SRE-http. Proxy Servers: * If you have a very dynamic site of non-access controlled resources, transparency concerns may override the desire for faster throughput. That is, you might want to suppress all proxy caching. This can be accomplished by setting proxy_cache=0 (in INIT_STA.80) Alternatively, you can use proxy_cache to "force revalidation." See the appendix for more details, or see the description of the PROXY_CACHE variable in INITFILT.DOC. * SRE-http will automatically supress proxy caching whenever access controls (such as CHECKLOG and ALLOW_ACCESS), or dynamic SSIs, apply to the resource. If desired, you can explicitily allow these resources to be cached -- just include a CACHE (or CACHE*) "permission" in a selector-specific entry in ACCESS.IN (or in ATTRIBS.CFG). Alternatively, resources listed as PUBLIC URLS (using PUBURLS.IN or ATTRIBS.CFG) are assumed to be cachable by proxy caches. * See HITMETER.DOC for hints on how to resolve problems associated with accurate metering of hits when proxy servers may be active. GoServe cache: * If you do enable the GoServe cache, be aware that it uses an http/1.0 response algorithim. Thus, your site will sometimes return http/1.1 responses, and sometimes http/1.0 responses. Although this is not fatal, it may have strange impacts (and it's somewhat asthetically displeasing). Therefore, SRE-http will only use the GoServe cache (that is, allow a request to be cached by GoServe) when a CACHE* permission exists. Alternatively, resources listed as LITERAL_NORECORD PUBLIC URLS (in PUBLURL.IN) are assumed to be cachable by the GoServe cache. * In general, we recommend using the GoServe cache only for resources that you do not care to audit (such as backgrounds and icons). In this vein, we recommend checking the "do not call filter" GoServe caching option. * Future releases of GoServe may upgrade the GoServe cache, so that it returns appropriate http/1.1 response headers. * The GoServe cache ignores TE: request headers. SREPROXY: * If your site is highly access controlled, or consists primarily of dynamic HTML documents (with lots of SSIs') or addons/cgi-bin scripts, then use of SREPROXY may hurt (increase) response times. * NUSTATUS contains an option that will display simple statistics on the proportion of requests satisfied by SREPROXY. * SREPROXY.DOC contains a detailed discussion on how to use SREPROXY. * If SREPROXY detects a TE: GZIP request header, it will NOT resolve the request. SSI and !DIR caches: * There is almost no reason not to use these caches.... the exceptions being: i) You have lots of HTML documents, and not much extra disk space ii) Your documents change rapidly (have lots of dynamic SSIs). iii) HTML files are contantly being edited, added, and removed. ----------------------------- Appendix: Cache relevant response headers used by SRE-http There are several ways to effect the "cache relevant" response headers returned by SRE-http: a) the setting of the PROXY_CACHE variable (in INIT_STA.80) b) the setting of the FIX_EXPIRE variable (in INITFILT.80) c) the use of PUBLIC_URLS d) the use of the NOCACHE, CACHE and CACHE* selector-specific permissions e) the use of selector-specific advanced options to specify an explicit response header. Note that these are used for "normal" responses -- cgi scripts and addons may override these rules, and provide their own headers. This listing goes from general to more specific -- with the setting of PROXY_CACHE controlling default behavior, whearas specification of selector-specific advanced options can be used to override these defaults. I) PROXY_CAHCE: PROXY_CACHE can take 4 basic values, which yields the following "default" response headers. 0= disallow caching If this is a dynamic file (i.e.; contains dynamic ssi's) Cache-control: no-cache otherwise Cache-control: private 1= allow caching Cache-control: public 2= allow caching, with revalidation Cache-control: public,max-age=0 If this is a dynamic resource (i.e.; an HTML document with SSIs), the following "stronger form" is used instead: Cache-control: public,max-age=0, must-revalidate 3= allow proxy caching with revalidation, full caching by private caches (private caches include the "browser's cache") Cache-control:public,s-maxage=0 I.1) A modification: If PROXY_CACHE=n_mmmmm, where n=0,1,2, or 3, and mmm is a integer number of seconds, then the following modifications occur: 1: Cache-control: public,max-age=mmmmm 2: Cache-control: public,max-age=mmmmm, or Cache-control: public,max-age=mmmmm, must-revalidate 3: Cache-control:public,s-maxage=mmmmm II) PUBLIC_URL If this selector is a "PUBLIC_URL" (i.e.; belongs to a PUBLIC realm, as specified in ATTRIBS.CFG), then rule I is ignored. Instead: Cache-control: public,s-maxage=mmmmm where mmmmm either is 0, or the mmmmm value from the proxy_cache variable. II.1) A Modification: If the PUBLIC_URL is a "literal" public_url, then use the following "re-validate in 1 day" header: Cache-control: public, max-age=86400 III) If a NOCACHE, CACHE or CACHE* permission is used, then the I and II rules are ignored. Instead, use the following: NOCACHE: use Cache-control: no-cache Pragma: nocache CACHE: use Cache-control: public,s-maxage=mmmmm (mmmmm is from proxy_cache) CACHE*: use Cache-control: public Note: * only one of these should be specified. Should a mistake occur, with more then one specified, then NOCACHE overrides CACHE*, and CACHE* overrides CACHE. IV) FIX_EXPIRE If Fix_expire is specified, and this is a "dynamic" response, then Expires: current_time+fix_expire is also added. if FIX_EXPIRE is not specified, and either i) PROXY_CACHE=0,2 or 3 ii) a NOCACHE permission is specified then Expires: current_time is also added. Note that using "Expires: current_time" implies "immediate expiration" V) Advanced options If you specify a header as a "selector specific" advanced option, then the matching header will be suppressed, and the header you specify will be used instead. This allows you to fine tune your "cache relevant" response headers. For example: Header add Cache-control: public,max-age=100000 means: "ignore rules I,II, and III; and use Cache-control: public, max-age=100000" Header add Expire: Mon, 20 June 1998 10:11:12 GMT means: "ignore rule IV, and use" Expires: Mon, 20 June 1998 10:11:12 GMT (a date in the past means "immediate expiriation") Notes: * Summarizing the more important Cache-control directives: NO-CACHE: never cache this response PUBLIC: this can be cached in a public place (a proxy server cache) PRIVATE: this can be cached by user-agents, or other "client side" caches MAX-AGE: after this response is this many seconds old, all caches must re-validate S-MAXAGE: after this response is this many seconds old, proxy (non- private) caches must re-validate this response. Private caches can ignore this directive. MUST-REVALIDATE: a strong re-validation (discourages caches from tolerating stale responses) Revalidation typically means sending a If-modified-since request to the origin server; which SRE-http can quickly answer if there has been on change. For further details, please see the http/1.1 specification * http/1.1 proxies will always use a Cache-control: response header instead of an expires: response header. http/1.0 proxies will typically ignore a cache-control response header. * resources subject to content negotatiation will often add a Vary: header. The Vary header lists the request headers that MUST match (in addition to the URI); such as the Accept and Accept-Language request headers.