27 June 1999. Hit-Metering and Proxy Caches under SRE-http I) Introduction One of the more significant improvements offered by the http/1.1 protocol is enhanced support for caching by proxy servers. Caching is defined as the resolution of requests by "caches" on "proxy servers" lying between the requesting client and your site. By reducing network traffic, and by reducing the load on your server, caching can provide significant improvements in throughput. This benefit extends beyond one's own site, since a reduction in network traffic improves throughput for everyone else using the internet. Caching is not without drawbacks. First and foremost, some resources should not be cached. These include highly dynamic pages, and requests for access-controlled resources. SRE-http recognizes these cases, and will by default suppress caching in such cases. Another drawback, which is not as serious but is more difficult to solve, is the conflict between caching and accurate auditing. Simply put, a request that a cache handles is a request that your site never finds out about. If it's important to have an accurate count of how many times a resource has been distributed, caching is a problem. Recognizing this, SRE-http allows you to disable proxy caching, either by default, or on a selector specific basis. But what about when you don't want to loose the benefits of caching, but do want some measure, even if it is not perfect, of how many hits are being handled by caches? This document describes one such method, based on the use of an http extension proposed in RFC 2227 -- the use of hit-meter trees. This solution requires that proxy servers be willing to report hit information back to your site. Thus, the success of this solution (hence the accuracy of your hits) depends on its extent of implementation. At this writing, most proxy servers do not support hit meter trees. However, as http/1.1 becomes more widely adopted, this should improve. Therefore: SRE-http's support of hit-meter trees is partially experimental, and partially as a way to encourage its broader adoption. The rest of this document describes how SRE-http implements hit-meter trees. Notes: * For further description of SRE-http and caching, see CACHING.DOC * For details on how to control the default level of caching, see the description of the PROXY_CACHE variable in INITFILT.DOC. * For details on how to control caching on a selector-specific basis, see the IN_FILES.DOC for a description of the CACHE and NOCACHE selector-specific permissions (that can be set in ACCESS.IN or in ATTRIBS.CFG); and see the description of PUBLIC_URLS in PUBURLS.DOC. II) Enabling the hit-meter tree Due to its experimental nature, use of a hit-meter tree is on a selector-specific basis. To do this, you must specify a Meter= "selector-specific advanced option" -- you can use the intermediate configurator to do this. Meter= can take a list of arguments. The simplest case, which is basically all that SRE-http supports, is to use no arguments. Otherwise, the arguments can be used for a variety of purposes, such as instructing proxies to limit the duration of cache entries. If you are interested in such fine control, please see RFC 2227. For example, assuming you use ATTRIBS.CFG to set selector-specific attributes: Realm: distrib rule: distrib/myfile1.zip permissions: cache Option: meter= instructs SRE-http to (whenever possible) enable a hit-meter tree for requests for /distrib/myfile1.zip. Note that the Permissions: line may not be necessary, its inclusion guarantees that SRE-http will allow proxy caching -- say, even when access-controls are required of other resources (SRE-http is conservative, and by default will suppress caching when access controls are enabled). Notes: * ADV_OPTS.DOC describes the various ways of specifying the Meter: selector-specific advanced option. * You can use a wildcarded rule to assign a Meter= advanced options to a set of requests. * If you specified with optional arguments in the Meter= advanced option, they will be included verbatim in the Meter: response header. * Technical note: SRE-http will include a Meter: and a Connection: Meter response header whenever a) a Meter= selector specific option has been specified b) the client (typically, a proxy server) includes a Meter: request header. Given the above, the proxy server will retain a count of how many requests for this resource have been recieved. From time to time, the proxy will send a special HEAD (or GET) request containing a Meter: header that includes a count of hits it resolved from its cached version of the resource. Actually, two counts are returned -- the number of times the resource was sent (200 responses), and the number of resource-did-not-change reports (304 responses). III) Output nformation Due to the experimental, etc. nature of hit-meter trees, SRE-http does not incorporate hit-meter tree information at a low level -- the various auditing databases will not be influenced by this information. Instead, SRE-http maintains a seperate count file, HITMETER.CNT (in the DATA/ subdirectory of the GoServe working directory). This file contains, on a selector specific basis: * the number of "200" responses from this site * the number of "304" response form this site * the number of "200" responses by proxy servers * the number of "304" responses by proxy servers * the name of the selector For example: 121 44 42 12 /distrib/archive/ver13.zip Thus, the total number of requests is the sum of these three numbers. Furthermore, the first two number should match values recorded in RECRDALL.CNT and COUNTER.CNT (possibly with small discrepancies, due to incomplete requests). Notes: * Each selector will have its own entry; wildcarded entries are not used. * Only selectors that have a Meter= advanced option appear in HITMETER.CNT. * Technical Note: HITMETER.CNT is modified by SRE-http's post-filter daemon (the same daemon that maintains the other LOG files). Thus, if you suppress "post-filtering", hit-metering will not be supported. A simple caching scheme is used to minimize disk writes, at any given instant,the values in HITMETER.CNT. This is a trivial concern, given the transient reporting (perhaps once a day) of proxy servers along a hit meter tree.