27 June 1999. Identifying publically accessible resources. The basic idea of SRE-http's "public urls" is that before logon or access_controls are checked, SRE-http sees if the request selector matches one of the PUBLIC_URLS. If so, no logon, etc. checking is attempted. In a sense, the PUBLIC_URLS are purposely placed outside of the 'protection' of SRE-http's various access controls -- these are resources which are always "open to the public". PUBLIC_URLS are specified either: a) via an entry in the "PUBURLS_FILE" file (PUBURLS.IN), or b) via a PUBLIC "realm" entry in the "selector-attributes" file (ATTRIBS.CFG) Although this document focuses on the syntax of PUBURLS.IN, the general notions also apply to entries in ATTRIBS.CFG. * In general, we now recommend that Public URLS be specified by using the ATTRIBS.CFG "realm definition file". For details on how to specify these "PUBLIC realms", see IN_FILES.DOC. The basic structure of an entry in the PUBURLS.IN is: candidate_sel anoption filename where: candidate_sel : is compared against the request selector anoption : can be LITERAL LITERAL_NORECORD or NORECORD filename : is a fully qualified file name (with possible * "wildcard characters") (anoption and filename are both optional) Entries in ATTRIBS.CFG are just like other "realm" definitions, with two exceptions: a) The realm must be PUBLIC. Since subrealms are allowed, the following realms define PUBLIC_URLS Realm: public Realm: public.set1 b) Requires: are ignored (since PUBLIC_URLS are never access controlled!) Furthermore: c) instead of "NORECORD", specify a NO_POSTFILTER permission d) instead of LITERAL, use a Redirect: literal= entry. When the candidate_sel (or the Rule: in a realm definition) matches the request selector (and the * wildcard character may be included in the candidate_sel), SRE-http will treat the request as "a request for a Public Resource". Specifically, when anoption and filename are not specified (or when a Literal: line does not appear in the realm definition), this means: * logon checking does NOT occur * selector-specific access control information, selector specific permissions, and selector-specific advanced-options are NOT examined * local redirections (such as home_name substitution, alias lookup and virtual directory lookup WILL occur). * server side includes may be attempted on HTML documents The anoption and filename option are included to provide further flexibility. * If anoption=NORECORD or LITERAL_NORECORD (or if a NO_POSTFILTER pemission is specified in a realm definition), then SRE-http will not record the request, or perform any other "post filter" action (for example, it will not record this request in the common-log audit file). * If anoption=LITERAL or LITERAL_NORECORD (or if a Redirect: literal=filename is used in a realm definition) then the resource is used "as is" -- no further processing of the "name" or contents of the resource, is attempted (for example, SSI processing will not be attempted) > If no "filename" is specified, then the request selector is assumed to point to a file under the GoServe data directory (virtual directories are not checked). > If "filename" is specified, then it is used "as is" Actually, if filename contains a * wildcard, the usual "wildcard substitution" is attempted (see the SREhttp.FAQ for a discsussion of wildcard matching and substitution). The use of the filename argument allows a limited form of "aliasing" and "non-data directory file transfer" * Setting anoption=LITERAL_NORECORD (or using a NO_POSTFILTER permission and a Redirect: literal=filename in a realm definition) implies "let goserve cache this" (assuming the GoServe cache is enabled). That is, GoServe caching is permitted for PUBLIC_URLS only when the matching public_url is "literal", and when it will not be recorded (the idea is that the GoServe cache should be used in "do not call filter" mode). Examples of entries in PUBURLS.IN: INDEX.HTM MAPS/* STORE/AD1.HTM LITERAL STORE/*.GIF NORECORD PICTURE/HELLO.GIF LITERAL_NORECORD D:\PICTS\HELLO.GIF FAMILY/* LITERAL_NORECORD D:\PERSONAL\* CANDYSTORE// PRICES/CHOCO.HTM Examples of ATTRIBS.CFG entries ;this is equivalent to INDEX.HTM Realm: public rule: index.htm ; this is equivalent to STORE/AD1.HTM LITERAL Realm: public.a rule: store/ad1.htm redirect: literal= ; this is equivalent to STORE/*.GIF NORECORD Realm: public.b rule: store/*.gif permissions: no_postfilter ;this is equivalent to CANDYSTORE// PRICES/CHOCO.HTM Realm: public.c Host: candystore rule: prices/choco.htm Notes: i) INDEX.HTM is publically accessible ii) Everything in the MAPS/ directory is publically accessible III)STORE/AD1.HTM is to be transfered "as is", where STORE/ is a subdirectory of the data directory. For example, if the GoServe data directory is D:\WWW, then D:\WWW\STORE\AD1.HTM would be transfered (without any SSI processing). iv) All .GIF files in STORE/ be transfered, with no recording of these transfers. v) PICTURE/HELLO.GIF causes transfer of d:\picts\hello.gif "as is"; with no recording done. Furthermore, PICTURE/HELLO.GIF may be cached by the GoServe cache. vi) All requests beginning with FAMILY/ are to be directly mapped to D:\PERSONAL\, and transfered "as is", without recording the request. Furthermore, the GoServe cache (if enabled) is allowed to hangle future requests for these resources. For example, a request for FAMILY/JUNE.JPG would result in D:\PERSONAL\JUNE.JPG being transferred. vii) Requests to PRICE/CHOCO.HTM of the "CANDYSTORE" host are allowed Notes on the examples: * As of SRE-http version 1.3d, you can specify host and port specific PUBLIC_URLS files (in addition to using the host_nickname// modifier on entries in this default PUBULRS.IN file). * To reiterate, the request selector (sent by the client to your server) is examined for matches to one of the PUBLIC_URLS. If multiple wildcard matches (and no exact match) occur, the "best" match is used The "best match " is defined as the match with the most characters before the * character; and in the event of ties, the most after. * HTACCESS file lookup is suppressed whenever the request matches a PUBLIC_URLS (HTACCESS access controls, redirection, etc. will not be attempted). * In general, all files are assumed to be relative to the data directory or a virtual directory. Note that for "literal" public_url's, virtual directories are NOT checked -- so all files are assumed to be relative to the data directory Reminder: You can skip the "data directory" lookup by explicitly naming the file that this PUBLIC_URL maps to. You can do this simply by adding the (fully qualified) file name after the LITERAL option. For example: STORE/AD1.HTM LITERAL D:\SHOP\ADS\AD_1.HTM * If you have no PUBLIC_URLS, you can speed up throughput a bit by setting NEVER_PUBLICURLS=1 (in INIT_STA.80). * To illustrate use of PUBLIC_URLS, SRE-http is shipped with a version of PUBURLS.IN (and an ALIASES.IN) setup to respond to a PUBLIC request by listing all files in the PUBFILES/ subdirectory of the data directory (assuming you've created such a subdirectory). * If a request matches a PUBLIC_URLS, then: If PRE_FILTER=FIRST, then the pre filter will be called. Otherwise, it will NOT be called (that is, PRE_FILTER="YES" is treated as a NO). * You can specify PUBLIC_URLS on a host-specific basis. ************** CAUTION ************** In fact, non-host specific values of PUBLIC_URLS will NOT be used as "default" values! ************** CAUTION ************** To do this, append a HOST_NICKNAME// to PUBLIC_URLS. For example: ZOO// HOURS.HTM LITERAL indicates that this entry applies to requests to the "host" with a host nickname of ZOO. CAUTION: When using "literal PUBLIC_URLS with fully qualified file names" (and other types of local redirection) URL resolution by the client's browser may have unexpected consequences.