13 Jan 1998:  BALANCER: A Dynamic Load Balancer for the GoServe Web Server

Abstract: BALANCER is used to redirect http (WWW) requests recieved by a 
          "main site" to a set of subsidiary sites.  BALANCER is 
          specificable: you can specify, on a request-specific basis, which 
          sites to redirect to. BALANCER is  dynamic; it gathers "load" 
          information from subsidiary sites, and uses it to distribute load 
          in an optimal fashion. 

                =====================================================


Table of Contents:

I.       Introduction

II.      List of features

III.     Installation

IV.      BALANCER
IV.a.      Detailed Description of BALANCER parameters

V.       Using CPUloadC

VI.      Technical Notes
VI.a.      Outline
VI.a.i       Balancer
VI.a..ii     ReWeighter
VI.a.iii     HeadQuery
VI.a.iv.     PortListen
VI.a.v.      RoundRobin
VI.a.vi.     CPUloadC
VI.b.      Selecting a WEIGHT_TYPE
VI.c.      Using <BASE href="URL"> elements.
VI.d.      Using Forwarding


VII.     Glossary

VIII.    Disclaimer


                =====================================================


Oftentimes the best way to operate a heavily visited WWW site is by
redirecting requests (originally directed to a "main site") to a set of 
"subsidiary sites". By distributing requests across multiple sites, 
this "load balancing" takes advantage of multiple processors, and can spread 
traffic over multiple lines.

BALANCER is designed to provide this "load balancing" in a dynamic fashion.
BALANCER uses continually updated, real time information on server loads 
to improve the "balance" of redirections.  That is, a "subsidiary site"
that is currently busy (say, due to several complicated database
lookups) will get less additional traffic -- but once this busy'ness ceases, 
the amount of traffic it recieves will be readjusted back up.  

BALANCER requires the GoServe Internet Server -- BALANCER is a "filter" for
GoServe.  Although BALANCER can handle simple document requests, it's primary
use is to redirect requests. That is, on many sites, BALANCER will redirect
all requests to other sites.  These other sites are often, but not
necessarily GoServe sites (in fact, they need not be OS/2 servers).
If they are GoServe sites, you can run special "client" software that 
uses socket calls to pass load information back to BALANCER.

Although BALANCER may be best used to redirect all (or nearly all) requests,
BALANCER can be used as a Pre-Filter for a small list of selectors (URI's).
if a request is not in this set, BALANCER can either attempt to resolve it
(using simple rules), or it can call one of the regular GoServe filters
(such as SRE-http or GoHTTP).  In other words, with some attention to setup
details, the use of BALANCER will NEVER cause a loss of functionality.

BALANCER consists of two components: the "main site" software, and 
"subsidiary site" (the client) software.  Although you need NOT use the 
"subsidiary site" software, BALANCER works best when used in an "all 
GoServe" environment, with the "subsidiary site" software running on machines 
that are also running GoServe web servers.

This documentation describes the installation, configuration, and
algorithim of BALANCER.  While we tried to keep it fairly simple, the reader
is assumed to have some understanding of http and the WWW.  Furthermore, 
several possibly obscure (possibly non-standard) terms are used throughout 
the documentation. It's a good idea to peruse the glossary before tackling the
rest of the document!

Lastly, BALANCER is free -- we do ask that you read the "it's not our 
liability" standard disclaimter at the bottom of this document.
If you have questions, please contact Daniel Hellerstein (danielh@econ.ag.gov).
Feel free to complain, suggest, and praise!



                =====================================================

II. BALANCER Features

  *  Redirection lists can be specified on a selector (URI) specific basis
  *  Optional forwarding of selected requests, with results re-sent to client
  *  Instead of balaning, a regular GoServe filter can be called on a selector
     specific basis
  *  Multi-home aware 
  *  Bayesian redirection methodology, or modified round-robin: in both cases,
     prior weights and on current-load information can be used.
  *  Several methods of determining current-load information; some of 
     which do not require any modifications to subsidiary sites
  *  Multi-threaded load-tracking
  *  Automatic detection of off-line subsidiary sites
  *  Susidiary sites can come on and off line without need for 
     formal notification 
  *  Simple documents (and graphic files) can be delivered by BALANCER.

                =====================================================
  
III. Installation

The following assumes you have GoServe installed, and know it's basics. 
For more info on GoServe, please see http://www2.hursley.ibm.com/goserve.
BALANCER has been tested under Warp 3.0 and WARP 4.0 -- it may not work
properly with earlier versions of OS/2.

BALANCER requires several "DLLs".  These include REXXUTIL, RXSOCK, and 
REXXLIB. REXXUTIL and RXSOCK are packaged with OS/2, REXXLIB is 
commercial software. If you do NOT own a copy of REXXLIB (it's an good 
value at $25, from http://www.quercus-sys.com/rexxlib.htm), please contact 
danielh@econ.ag.gov for alternatives.

Installation instructions:

  1) UNZIP BALANCER.ZIP to an empty, temporary directory.
     The following files will be created:
           balancer.80  : The "main site" portion of BALANCER
           balance2.rxx : The "track site load and save results" thread
           balance3.rxx : The "use HEAD requests to obtain information from
                          subsidiary sites" thread
           balance4.rxx : The "recieve information, via socket calls,
                          from subsidiary sites" thread
           balance5.rxx : The "round-robin" accounting thread -- used
                          to track where requests have been redirected to.
           cpuloadc.cmd : The "send load information, via socket calls,
                          to main site" program (runs on subsidiary 
                          sites).

   2) Copy balancer.80, balance2.rxx, balance3.rxx, balance4.rxx, and
      balance5.rxx to your "GoServe working directory" (for example,
      D:\GOSERVE.)
  
   3) Set your GoServe Filter to be balancer.80.  For example, use the 
      Options-Filter tab on the GoServe desktop object.
      NOTES:
         * If you wish to run BALANCER on a port other then 80, just
           change the extension. For example, to run on port 8081:
           rename balancer.80 to balancer.8081 (you'll also have to
           tell GoServe to use port 8081)

   4) Using your favorite text editor, edit BALANCER.80 (or whatever you may
      have renamed BALANCER.80 to).  For BALANCER to work ...

        ****    You MUST set several parameters in BALANCER.80  *****

      The next section describes these parameters in detail.

   5) To use several of the "load information" gathering options, you'll need
      to install CPUloadC.CMD on your subsidiary sites (see section V for 
      details on CPUloadC.CMD)
       
You are now ready to try it out. If you want to keep an eye on things, you
should obtain PMPRINTF (from http://www2.hursley.ibm.com/goserve).


IV.  BALANCER parameters.

BALANCER needs to be told where, and how often, to redirect (or forward) 
requests. This is done through the somewhat tedious (and sometimes dangerous)
mechanism of changing parameters in the BALANCER.80 program file (given
sufficient interest, we will add some kind of "configurator" program to
the BALANCER package).

Note: these parameters are NOT meant to be changed on-the-fly. In order
      for them to take effect, you should start and restart GoServe.

The (alphabetical list of) user changeable parameters are:
    ALIASES.     = A stem variable containing selector-to-file mapping rules.
    BAYESIAN     = 1: use "bayesian" load balancing; 0: use "round robin"
    DEF_WEIGHT_TYPE= Default mode of load monitoring
    DOIT.        = A stem variable containing selector-matching 
                   information 
    FILTER_NAME  = Name of a "regular" GoServe filter (optional)
    NICKANAME.   = A stem variable containing HOST information 
    NO_URL_FILE  = File to use when request does not match a DOIT.n entry.
    TYPE_2_PORT  = Port to use in conjunction with CPUloadC    
    VERBOSE      = How much status information to display
    WEIGHT_TYPES.= A stem variable specifying subsidiary-site-specific
                   load-monitoring methods

For most of these variables the default values will work.
However, the DOIT. variable MUST be set.


IV.a.  Detailed Descriptions of BALANCER parameters (in alphabetical order)


ALIASES. :  Stem variable used to match requests to files on the "own" site

   You may want the main-site to handle some requests (such as requests for
   the home page, or other commonly requested small documents). In such cases,
   one must instruct BALANCER to NOT redirect (or forwared), the request; and
   to resolve the request just like a normal (albeit rudimentary) web server.
   This is done by specifying an element of $ in the appropriate DOIT.n.!SITES
   field (you can think of $ as being a special form of site).

   When the $ (the "own-site") is chosen by BALANCER, you can either:
    1) you can call a "regular" GoServe filter (such as GoFilter.80). 
    2) have  BALANCER attempt to resolve the request, 
   
   Case 1 is accomplished by setting the FILTER_NAME parameter; see the
   discussion of FILTER_NAME for details.

   For Case 2, BALANCER will (by default) append the selector to the
   GoServe data directory.  You can override this simple default
   by using ALIASES.

   BALANCER will check each of the ALIASES. entries to see what directory
   the request maps to -- the GoServe data directory is used only if no 
   matching ALIASES. entry are found.

   The syntax of ALIASES. is:

     ALIASES.0 = # of aliases

     ALIASES.n=' target d:\path '

    where
        n=1.. ALIASES.0
        target : selector is compared to the target
        d:\path : target is replaced with the value of d:\path

    A request selector is (case insensitively) compared to the target. 
    If this target "abbreviation matches" the selector, then the 
    matching portion of the request is replaced by the d:\path.
    After conversion of / to \, the resulting file is transmitted to the
    client.

    When specifying target, be aware that this "replacement" is not 
    particularly intelligent. That is, you must be sure to properly include \ 
    and / characters (in the target and in the d:\path).
    
    BALANCER does do one modification --  a leading / is removed from target
    (and also from the request selector).

   Example:
        ALIASES.0=3
        ALIASES.1='COWS/ e:\animals\cows\  
        ALIASES.2='DOGS/POODLES/  e:\pets\type3\
        ALIASES.3='BIRD  d:\avian '

    Assuming the above ALIASES, and a GoServe Data Directory of D:\WWW
             A selector of  .....     yields 
          /COWS/JERSEY.HTML       E:\ANIMALS\COWS\JERSEY.HTML
          /PIGS/SOW.JPG           D:\WWW\PIGS\SOW.JPG
          /BIRD10/PIX/WINGS.AVI   D:\AVIAN10\PIX\WINGS.AVI

   Note that ALIASES. are NOT host-sensitive (which may cause problems on
   multi-host sites).

                           ------------------------
BAYESIAN: Select "bayesian" or "round robin" balancing methodology

BALANCER can use one of two "balancing" methodologies: 

 BAYESIAN=1 : a "bayesian" method that uses random choices across "bayesian" 
              probabilities

 BAYESIAN=0   a "round robin" method that attempts to keep the number of requests
              requests equi-proportional; relative to the various weighting factors 
              and the estimated processing load of each request.  

This is the "default" method -- it can be overridden (on a selector specific
basis) by the value of DOIT.n.!BAYESIAN.

The bayesian method involves fewer computations (it's memoryless in terms of
where results have been redirected to), but can occassionaly yield
odd results (such as sending a runs of requests to otherwise equally busy
sites).  The  "round-robin" method avoids this by tracking what's been sent
where, at the cost of greater computation/memory manipulation.

Please see the technical appendix for further discussion of the relative merits
of the bayesian and round-robin approaches.

                           ------------------------

DEF_WEIGHT_TYPE : Default mode of load monitoring.

   BALANCER supports three forms of load monitoring -- what is referred
   to as the WEIGHT_TYPE.  The DEF_WEIGHT_TYPE defines the default
   WEIGHT_TYPE.

      DEF_WEIGHT_TYPE=0  : Static. No load information is obtained.
                           The "prior weights" (specified in the DOIT. 
                           stem variables) are used to redirect requests.
      DEF_WEIGHT_TYPE=1  : HEAD requests.  BALANCER will use HTTP "HEAD"
                           requests to obtain load information from 
                           subsidiary sites.
      DEF_WEIGHT_TYPE=2  : CPUloadC generated information -- BALANCER will
                           listen to the TYPE_2_PORT port for load 
                           information, sent by CPUloadC.CMD from subsidiary
                           sites.
                           
  Example:

      DEF_WEIGHT_TYPE=1
 
  Notes:
   *  If DEF_WEIGHT_TYPE>0, then DEF_WEIGHT_TYPE can be overridden on a 
      subsidiary-site-specific basis (by using the WEIGHT_TYPES. variables).
      If DEF_WEIGHT_TYPE=0, then "dynamic" load balancing WILL NOT BE 
      ATTEMPTED --- although the "prior" weights will be used (for non-equal,
      random redirection).

   *  For details on the "weight types", please see the technical notes at
      the end of this document.

                        ------------------------

DOIT. : Stem variable containing "selector-specific" redirection information

  The core action of BALANCER is the matching of requests (sent to the
  main-site) request to one of the DOIT.n entries.
  This matching is based on "selector"; and on  multiplle-homed sites, on the
  "host".
 
  The basic syntax of DOIT. is:
    DOIT.0        = # of entries 

    DOIT.n.!SEL   = The selector (it may contain * wildcard characters)
    DOIT.n.!HOST  = The host this entry applies to -- it is used to
                    "limit the scope" of the entry.
    DOIT.n.!SITES = A space delimited list of sites (URLS)
    DOIT.n.!WEIGHTS = A space delimited list of "prior" weights (optional)

    DOIT.n.!FORWARD = If specified, and equal to 1, "forwarding" is used instead
                      of redirection
    DOIT.n.!BAYESIAN = 1: Use "bayesian weighted" random draws.
                       0: Use "bayesian weighted" round robin.
    DOIT.n.!ESTIMATE = Estimate of number of seconds required to respond
                       to respond to this SELector. If 0, "infinite" lifespan
                       (which implies a standard round-robin algorithim).
 where n=1... DOIT.0

   When examining a DOIT.n entry, BALANCER will:

     1) See if a  DOIT.n.!HOST has been specified. If so:
        DOIT.n.!HOST is compared to the IP address of the main-site (as 
        supplied by GoServe), and to the HOST: request header. If neither 
        match, the entry is skipped.

        Note that if no DOIT.n.!HOST is specified, this step is automatically 
        satisfied.  That is, entries without a DOIT.n.!HOST field are applied 
        to all requests.

     2) Assuming step 1 is satisfied; the DOIT.n.!SEL is compared to the 
        request selector. This comparision is a case-insensitive, and
        allows for wild-cards (one or more *) to appear in the DOIT.n.!SEL 
        field.

     3) If DOIT.n.!SEL "matches" the selector, BALANCER will select
        one of the elements in the DOIT.n.!SITES field.  Each of these
        elements is a IP address that points to WWW site (and,optionally, a 
        directory on this site).

        This decision will be based on whatever information is currently 
        available about the load  on these sites, and  on the "prior
        weights" specified in the the DOIT.n.!WEIGHTS field.  

   Note that the first "matching" entry is used -- there is no attempt to find 
   a "best" match.


  Examples:

    DOIT.0=5 
      
      Five entries.

    DOIT.1.!SEL='/new/*'
    DOIT.1.!SITES='hissite.here.net hersite.there.com  oursite.where.org '
  
     This entry will "match" all requests that start with /NEW/ will 
     the "prior weights" for each of these sites equals 1.0

    DOIT.2.!SEL='/IMGS/*.GIF'
    DOIT.2.!SITES=' pictures.wow.net/guest $ '
    DOIT.2.!WEIGHTS='1 5 '
 
     This entry will match all requests that start with /IMGS/ and that end 
     with .GIF.

     The $ "site" is a pointer to the "own" site -- BALANCER will NOT
     redirect or forward, it will directly return the appropriate file
     to the client.

     You can include "subdirectory" information on a site -- redirection will 
     be relative to the subdirectory.  Thus, the pictures.wow.net/guest 
     entry means: "redirect to http://pictures.wow.net/guest/imgs/xxx.gif." 
     Note that subdirectory information is NOT used when determining load.


    DOIT.3.!SEL='/ALTVIEW/*
    DOIT.3.!HOST='altwww.oursite.com'
    DOIT.3.!SITES='hissite.here.net'

        This is a "host" specific entry -- it is only used if the a HOST 
        request header of "altwww.oursite.com" accompanies the client's 
        request. Note that only 1 site is specified (hence, no prior weights
        are required).


    DOIT.4.!SEL='/CGI-BIN/BIG_JOB*
    DOIT.4.!SITES='bigsite.beer.net  smallsite.wine.com '
    DOIT.4.!WEIGHTS='1 1'
    DOIT.4.!FORWARD=1
    DOIT.4.!BAYESIAN=0
    DOIT.4.!ESTIMATE=2
        
        For requests for the /CGI-BIN/BIG_JOB script, use "forwarding" instead of
        redirection.  Forwarding consists of BALANCER using socket calls to 
        forward the request, and all the cookies and other request headers,
        to the subsidiary site.  The response by the subsidiary site will be sent
        back to BALANCER, and BALANCER will "forward" these back to the client.

        Please see the Technical Notes for details on "forwarding."

        Also note the use of !BAYESIAN=0 and !ESTIMATE=2; which states that
        a "round robin" balancing "selection" algorithim is to be used,
        with an "estimated" 2 seconds selector-response time.

    DOIT.5.!SEL='*'
    DOIT.5.!SITES='bigsite.beer.net  smallsite.wine.com '
    DOIT.5.!WEIGHTS='15 2  '

     This is the "default" -- it is used if entries 1,2 and 3 are not matches.

 Assuming the above:     A request for  .....  will match entry # ...
                          /NEW/GOO.1             1
                          /OLD/HELP.HTM          3
                          /IMGS/WOW.GIF          2
                          /IMGS/ZOO/TIGER.JPG    3
                          /IMGS/ZOO/CAMEL.GIF    2
                          /NEW/IMGS/WATER.GIF    1


  
 DOIT. Notes:

   * If you only have 1 site (i.e.; the $ "own site" code), then weight
     information is ignored (as one would expect!)

   * the number of elements in each DOIT.n.!SITES must equal the number of 
      elements in DOIT.n.!WEIGHTS.  However, if you do NOT specify a 
      DOIT.n.!WEIGHTS,  an implicit value of 1 will be used (for each site
      listed in DOIT.n.!WEIGHTS).

   * If a DOIT.n.!HOST field is not specified, then the entry applies to all 
     requests.

   * The first match is used. Thus, if a host-specific entry that matches a
     selector FOLLOWS a generic entry that also matches, then the host
     specific entry will NOT be used. Double thus, heavily wildcarded entries 
     (such as DOIT.n.!SEL='*') should appear after more narrowly defined 
     entries.

   * It's a good idea to have a "match all" (a * selector) entry as the last 
     entry.

   * !ESTIMATE is only used when BAYESIAN=0 -- it modifies the round-robin 
     selection criteria.  Note that if !ESTIMATE is not specified, a default 
     value of 1.0 is used.

   * Larger values of !WEIGHTS will INCREASE the probability of redirecting
     to a site.

                        ------------------------
FILTER_NAME: Name of a "regular" GoServe filter.

Instead of redirecting requests, there may be times (i.e.; as a default) when
you're willing to let the "main site" responsd to the request.  As mentioned
in the discussion of ALIASES.; such responses (signified by the use of 
$ in a DOIT.n.!SITES list) can either be resolved by BALANCER directlyr,
or BALANCER can call a "regular" GoServe filter and let it do the work.
By "regular", we mean one of the stand-alone GoServe filters; such as SRE-http,
GoHTTP, or GoFilter.80.

To implement this latter case (calling a regular filter) you must set the
FILTER_NAME variable.  For example, to use SRE-http:
   FILTER_NAME='SREFILTR.80'
Then, when a $ site is chosed, SREFILTR.80 will be called (with the 
standard GoServe arguments).

The net effect is as if BALANCER were not there; the "regular" filter will
respond as if it had been called directly. The only detriment is the time
lost while BALANCER figures out that it shouldn't do anything! Otherwise,
there is NO loss in functionality.

There is one proviso: before calling FILTER_NAME, balancer will check
it's ALIASES.  Thus, for very simple requests (i.e.; image files or
simple documents with no access controls), the use of ALIASES. can avoid
the time penalty associated with calling the "regular" filter.


Notes:
     * Advanced users note: you can speed things up a bit by loading the 
       filter into macrospace, and setting FILTER_NAME equal to the
       "macrospace" name.


                        ------------------------


NO_URL_FILE: What to send if no matching DOIT.n entry can be found.

  NO_URL_FILE should be the fully qualified name of an HTML response file.
  NO_URL_FILE is used when there is no matching DOIT. entry.
  
  If NO_URL_FILE=' ', a short, generic message is used.

  NOTE: If you use specify a doit.n.!sel='*' entry (as a "default"),
        then NO_URL_FILE will never be needed.

  Examples: NO_URL_FILE=''
            NO_URL_FILE='D:\GOSERVE\NOTFOUND.HTM'

                        ------------------------

SLEEP_SECONDS:  seconds to wait between "HEAD" (WEIGHT_TYPE=1) load requests

  SLEEP_SECONDS is used to control how frequently HEAD (WEIGHT_TYPE=1) 
  requests are generated.  Larger values mean less-up-to-date information, but
  less load on subsidiary sites.

  Note that SLEEP_SECONDS does not effect WEIGHT_TYPE=2 (CPUloadC 
  generated) information.  However, CPUloadC contains WAIT_SECONDS, which is 
  an equivalent "intensity of measurment" parameter.

  Example: SLEEP_SECONDS=60

                        ------------------------
TYPE_2_PORT: Port used by mode=2 (CPUloadC) load monitoring

  TYPE_2_PORT should be a valid TCP/IP port number. It is the port
  used by CPUloadC.CMD (running on a subsidiary site) to communicate 
  with the main site. Typically, a 4 digit value is used; such as 8083. 

  If you change TYPE_2_PORT (say, it conflicts with some other TCP/IP aware 
  software), You Must Make Sure That You Also Change The Value of the PORT 
  Variable in CPUloadC.CMD (on all sites that are running CPUloadC.CMD)!


Example: TYPE_2_PORT=8083

                        ------------------------


VERBOSE : The amount of status information to report

  VERBOSE can be equal to 0, 1, 2 or 3
      0=none (errors only) 
      1=some 
      2=lots 
      3=too much 

 Example: VERBOSE=2

 Note that STATUS information is written to the PMPRINTF window.

                        ------------------------

WEIGHT_TYPES. : A stem variable of site-specific load monitoring instructions
   
   Use the WEIGHT_TYPES to specify, on a subsidiary-site-specific basis,
   the method of CPU-load monitoring to use.  

         !  If DEF_WEIGHT_TYPE=0, then WEIGHT_TYPES. are IGNORED !

   Basic syntax:

      WEIGHT_TYPES.0 = # of WEIGHT_TYPES. entries

      WEIGHT_TYPES.n = site_name  weight_type , head_string , adjuster
    
   where (note use of commas as delimiters):

       site_name = one of the sites  used in the DOIT.n.!SITES
                   variables. The site name should be IDENTICAL
                   to it's specification in the DOIT.n.!SITES field.
     weight_type = a weight_type of 0,1, or 2 (same meaning as in the
                   DEF_WEIGHT_TYPE variable)
    head_string  = only used with weight_type=1   (optional) 
                   The "selector" sent with a HEAD request.
                   If not specified, !PING?RESPONSETIME is sent.
                   For more information on head_string, see the technical 
                   notes.
      adjuster   = A site-specific normalization factor; it's used to adjust 
                   mesures, that use different weight_types, to a standard 
                   metric. If not specified, a default value of 1.0 is used.

                   NOTE: Larger values of adjuster DECREASE the probability
                         of redirecting to the site.
  Examples:
      WEIGHT_TYPES.0=3

      WEIGHT_TYPES.1='hissite.here.net  2  , , 2.5 '
      WEIGHT_TYPES.2='pictures.wow.net 1 , !STATUS
      WEIGHT_TYPES.3='oursite.where.org 0 '
                
  Note:
      For $ (the "own site name"), a "weight_type" of 0 is always used.

                =====================================================

V. Using CPUloadC.CMD


CPUloadC.CMD is a REXX program that is designed to communicate with 
BALANCER.80 CPUloadC.CMD, when running on a (set of) subsidiary sites, will 
use socket calls to transmit current load information to the main site.  Since
CPUloadC.CMD can cheapily gather and transmit this information, in many 
cases it provides a more efficient mechanism then the use of HEAD
method requests.

To use CPUloadC.CMD on a subsidiary site, the site's server machine must be 
must be running OS/2. CPUloadC works best when run as  daemon in conjunction
with a GoServe server; since it knows how to ask GoServe for load statistics.
However, it can also be used in a "stand alone" mode; in which case GoServe 
need not be in use.

To use CPUloadC.CMD, you need to copy it to the subsidiary server, and change 
a few parameters.  After that, you can either run it in stand alone mode, or 
launch it as a "daemon" under the GoServe process. For example,the 
CUSTOM_INITS parameter of SRE-http can be used to launch daemons. Other 
GoServe servers may be easily modified to achieve the same result.

   Do note that "running under the GoServe process" implies that some GoServe 
   thread launches CPUloadC.CMD (i.e.; by a call to the REXXTHREAD function of
   REXXLIB) -- so launched, CPUloadC can issue special GoServe commands that 
   are not available if CPUloadC is running in it's own process.

The following parameters are set in CPUloadC.CMD. You MUST correctly
set the MYNAME and MAIN_SERVER parameters!

Shorter, alphabetical, descriptions:

   MAIN_SERVER : IP address of the "main server" 
          MODE : How to compute load -- 1,2,3 or 4 
        MYNAME : Name of this server (as expected by main_server) 
          PORT : Port that main_server is listening to 
        USESAY : If=1, then use SAY (not PMPRINTF) for status reports 
       VERBOSE : Level of status reporting (0=little, 1=some, or 2=lot) 
  WAIT_SECONDS : Frequency of update (in seconds} 


Longer, alphabetical, descriptions:

MAIN_SERVER: the IP address of the "main site".

   You can use either numeric, or  name, IP address.

   Example: MAIN_SERVER='www.oursite.net'

MODE: Method of measuring load.
    Four values of MODE are available
    MODE=1 : Use the extract('responsetime') GoServe statistic (the average 
             response time over the last 100 or so requests)
    MODE=2 : Use the extract('clients') GoServe statistic (the current number 
             of active client).
    MODE=3 : Use QPROCSTATUS "time slice" monitor (requires RXU)  
    MODE=4 : Real dumb default -- seconds required to count to 4000

  To use MODE1 or MODE2 you MUST launch CPUloadC.CMD as a semi-permanent
  thread under GoServe (say, by using rexxlib's REXXTHREAD procedure). 
  Note: SRE-http users can use the CUSTOM_INITS parameter to "launch"
       this program.
  To use MODE3, RXU must be in your LIBPATH.
  If the selected MODE can not be used, them MODE=4 is used as a default.

  Example: MODE=2 


MYNAME: Name of this subsidiary site.
   This name must exactly match the name used (in the DOIT. and WEIGHT_TYPES.
   variables) in the main site's BALANCER.80 program.
   If you leave this blank, this server's ip address will be used. 

   To reiterate:
      The crucial point is that MYNAME must match the name expected by the 
      MAIN_SERVER.  That is, if this subsidiary site is at FOO.BAR.NET, 
      and a MAIN_SERVER (on your sub-domain) is expecting FOO, then 
      you MUST set myname='FOO', not 'FOO.BAR.NET' (that is, no attempt is 
      made to match/resolve IP addresses; nor match IP name to  IP numbers).  

   Examples: MYNAME='HERSITE.THERE.COM'
             MYNAME=''


PORT: Port to use.
   PORT must match the value of the TYPE_2_PORT variable specified 
   in BALANCER.80 (that is running on the main site).

   Example: PORT=8802

USESAY:  Disable use of PMPRINTF

   If you do NOT have REXXLIB installed on a subsidiary site, set usesay=1

   Example: usesay=1 

VERBOSE: Status reporting.
    VERBOSE=0  : no status info,
    VERBOSE=1  : some,
    VERBOSE=2  : lots 

   Example: VERBOSE=2

WAIT_SECONDS:  Seonds to wait between "updates".  

   Load measurements on this subsidiary-site (with transmittal of results to 
   the MAIN_SERVER) will occure every WAIT_SECONDS seconds. 

   Example:WAIT_SECONDS=60


                =====================================================


VI. Technical Notes

VI.a: Outline of BALANCER

The following outlines BALANCER's logic. We start with the assumption that 
BALANCER.80 is installed in D:\GOSERVE and has been started. Note that the 
following outline is essentially correct, but is not meant to be a programmers
guide (the actual implementation may chop things up in a different order).

Since BALANCER.80 is multi-threaded, the following outline is organized in 
several pieces. These threads do "talk" to each other -- to make 
sense of things, you might need to flip back and forth between the 
various pieces.

VI.a.i. ** The BALANCER.80 thread.

1) When the first request arrives, BALANCER.80 will initialize.

1.a) Initialization requires launcing of BALANCE2.RXX as a "semi-permanent" 
     thread.  This is known as the "ReWeighter" thread.

          NOTE: Unlike GoServe "request-specific" threads, the 
                semi-permanent are alive for as long as GoServe is alive. 

1.b) BALANCER.80 will create a queue and a semaphore, and use them 
     to transfer the values of the DOIT. and WEIGHT_TYPES. to the 
     ReWeighter thread.  
        
1.c) BALANCER.80 will wait for the ReWeighter thread to signal that "it is 
     safe to proceed".

2) After recieving this signal, and on all subsequent requests, BALANCER 
   will match the "request selector" (and possibly HOST: and IP address) to a 
   DOIT.n entry.

2a) If no match can be found, eitehr the NO_URL_FILE, or a short message, is 
     sent to the client.  BALANCER then exits.

2b) Using the number of the matching entry, a "vector of 
    bayesian probabilities" is pulled from the OS/2 environment. This "vector"
    is placed there  by the ReWeighter thread, and is based on a combination
    of "prior weight" information (as specified in the DOIT.n.!WEIGHT
    fields) and "posterior weights" determined by measuring the load on
    subsidiary sites. Note that each DOIT.n entry is associated with it's
    own "vector of bayesian  probabilities" environment variable.

2c) Using this vector of bayesian probabilities, one of the sites listed in 
    DOIT.n.!SITES is selected.  This selection is either purely random,
    or based on a "round robin" methodology.
 
2c.i) If purely random (BAYESIAN=1), the actual site chosen (for redirection) 
    will be random -- with sites  having a higher bayesian probability being 
    chosen more often.

2c.ii) If "round robin" (BAYESIAN=0), BALANCER will send the "bayesian weight"
       (and "which DOIT. entry") information to the "RoundRobin" thread 
       (BALANCE5.RXX). The RoundRobin thread will then examine it's history
       list (created from prior instances of this step), and choose a 
       selection so as to maintain a equal quantity of (weighted)
       redirections to each site.

2c.ii.1) If RoundRobin is too slow (about 1/2 second) in responding, BALANCER
         will stop waiting, and use a BAYESIAN methodology. However, BALANCER
         will send the results (which selection from the DOIT.n.!SITES list) 
         to RoundRobin (so as to keep accounts accurate)

2d) After selecting a site, BALANCER either:
     1) sends a "302" redirection response to  the client. The client will 
        then redirect the request to the URL that is included in this response 
        (i.e.; it may include a modified directory, as well as a new IP address).
     2) If the request was a POST method, or if !FORWARD=1, then a "proxy server"
        like forwarding will take place; with BALANCER waiting on results and
        retransmitting them to the client.

    BALANCER then exits.

VI.a.ii. ** The ReWeighter Thread

1) Upon invocation, ReWeighter will read the DOIT. and WEIGHT_TYPES. 
   information passed to it by BALANCER.80.

1a) The DOIT. entries are scanned, and a list of all unique "sites" is 
    constructed (i.e.; "directory" information is not used when creating this 
    list of sites). A default "posterior weight" (of 1.0) is assigned to each 
    site.

1b) This list is then compared to the WEIGHT_TYPES. variable. If no match
    is found in WEIGHT_TYPES.:
       i) the DEF_WEIGHT_TYPE is used as the "method" of load monitoring.
      ii) A "normalization" adjuster of 1 is used
    Otherwise, the WEIGHT_TYPE and a ADJUSTER are read from the matching 
    WEIGHT_TYPES. entry.

 1c) ReWeighter will then launch a set of threads that do the actual 
    monitoring:
    i) BALANCE4.RXX is launched (as the PortListen thread). It will open 
       and monitor a TCP/IP port (using the value of TYPE_2_PORT).
  ii) For each site that is to be monitored using weight_type=1, a seperate 
      instance of BALANCE3.RXX is launched (as a HeadQuery thread).
  iii) While not a "monitoring" thread, ReWeighter also launches the
       RoundRobin (BALANCE5.RXX) thread.

1d) ReWeighter then tells BALANCER that initialization is done; and then
    goes into an infinite loop. Every 5 or so seconds it will
    read a special queue for "load balance" information. Both the PortListen 
    thread and the (possibly several) HeadQuery threads will write 
    load information to this queue.  
    
1e) The information read from the queue is used to update the "posterior 
    weights".  Note that the posterior weights are specific to a site. In 
    contrast, prior weights are drawn from the DOIT.n.!WEIGHTS field --
    they are specific to  a selector-site (actually, to a selector-site/dir) 
    pair.

1f) Each site is then quickly checked to see if it's site is still accepting
    requests. If a server is not accepting requests (i.e.; it is not running), 
    it's posterior weight will be set to 0: and a 0 posterior weight means a 0 
    probability of being selected (see the next step).

1g) Using these "posterior weights", and the "prior weights" (that are
    specified in the DOIT.n.!WEIGHTS variable), a set of "bayesian
    probabilities" are  computed: with each site in DOIT.n.!SITES assigned
    it's own probability.
  
    For DOIT.n, the  The formula is:
                  P_j * p_j
      prob_j =   -------------       where i runs from 1 to J,
                 Sum{ P_i*p_i }

    and where:
        prob_j = Probability of redirecting (or forwarding) to the j'th site 
                 listed in DOIT.n.!SITES 
        J   = Number of sites listed in DOIT.n.!SEL
        p_j = "prior weight" for site/dir j   
        P_j = "posterior weight"  for site j
    
    Thus, large values for a "prior weight" (or a posterior weight) increase 
    the probability that a site will be used.

1h) These bayesian weights are the saved to the OS/2 environment, where 
    BALANCER.80 can then quickly read them.

1i) ReWeighter waits for a few seconds, and the checks if there is anything in 
    it's queue. If there is, go back to 1d. Note that it is NOT expected that 
    a complete set of new load information is provided to ReWeighter on each 
    loop -- typically, the load information for only a few sites will change
    "every few seconds". That  is; posterior weights are used until new
    information is obtained.


1f) ReWeighter Notes:
 
    i) The use of threads is an effective means of isolating 
       "bad" servers.  At worst, a "hung" server will effectively end the flow
       of "load information", but information on all other sites will 
       continue to be updated.

   ii) ReWeighter is sent "response time" information. Since 
       long response times indicate an overloaded site, the inverse of 
       response time is used as the posterior weight.

  iii) When ReWeighter is matching sites to WEIGHT_TYPES. entries,
       it will NOT attempt to "resolve" IP addresses. That is, you should 
       have a seperate WEIGHT_TYPES. entry for each "variant" of a 
       site name that you use (i.e.; a full IP name, a full IP number, or a
       sub-domain name).

VI.a.iii. ** The HeadQuery thread(s)

  1) Each HeadQuery thread is responsible for monitoring just one site.  

  1a) HeadQuery issues a HEAD method HTTP request to  the site "it is
      responsible for monitoring". The "selector" for this 
      HEAD request is determined by the head_string element in the 
      WEIGHT_TYPES.  variable that corresponds to this site.  Basically,
      there are three classes of "selectors" that may be used:
       i) The default (used if head_string is not specified, or if a 
          WEIGHT_TYPES. entry for this site was not specified):  
            a selector of 
                /!PING?RESPONSETIME
            is sent.  When the SRE-http web server recieves such a request, 
            it will query GoServe for the "average response time",
            and return it.
      ii) The head_string equals 0:
            a selector of 
                /
            is sent.  The total time it takes to respond to this selector                        
            is used.
     iii) A non-0, non-empty value of  head_string is specified.
          this value of head_string is sent, as is.

    Note that types i and iii expect a very specific form of response from the
    server. In particular, the third word on the response line is used as the 
    measure of load (i.e.; the average response time). If this third word is 
    not a number, then method ii is used (total response time).

    Thus, if you are not running SRE-http, you should determine what HEAD 
    request will yield such a response.  If no such HEAD request exists (for
    the server running on a subsidiary site), you should set head_string=0,
    (or use a weight_type of 2).

 1b) The load measurement is then "normalized"; the load measure is 
     multiplied by the value of the adjuster element in the 
     WEIGHT_TYPES. entry that corresponds to this site.

     By default, adjuster=1 -- no normallization occurs. 
     However, if you know that the measures returned from a given site are 
     systematically larger (or smaller) then otherwise similar sites (under 
     the same load); you can "correct" this "bias" by suitably setting the
     adjuster.

     Note that the adjuster complements the "prior weights". In fact, you 
     could use prior weights for all "normallizaton". However, since prior 
     weights are sometimes used "as is" (such as when BALANCER first 
     starts) ...
         ** it is recommended that the adjuster be used to "normalize" 
            measures to some common grounds; and the "prior weights" be used
            to account for intrinsically faster servers.

  1c) If HeadQuery has problems reaching the site, the "load" is set 
      to 0, which causes a posterior weight of 0.

  1d) The load information is placed on the ReWeighter's queue.

  1e) HeadQuery (this instance) then waits for SLEEP_SECONDS seconds,
      and jumps to step 1a.


VI.a.iv. ** The PortListen Thread

  1) The PortListen thread is responsible for recieving load information 
     passed "back to the main site" by instances of CPUloadC running on
     subsidiary sites.  All sites running CPUloadC should transmit IP messages 
     to a single port (the TYPE_2_PORT) that is monitored by the PortListen 
     thread.

  1a) As each message is recieved (on the TYPE_2_PORT), PORT_LISTEN will 
      determine whether it is from one of the subsidiary sites that have a 
      WEIGHT_TYPE of 2.  If it is not, the message is discarded.

  1b) If it is, it will be "adjusted" (following the same logic as in step 1b 
      of the description of the HeadQuery threads)

  1d) The load information is place on the ReWeighter's queue.

  1e) PortListen then waits for the next message to appear, and then goes
      to step 1a.


VI.a.v. ** The RoundRobin thread

  1) When BAYESIAN=0 (or DOIT.n.!BAYESIAN=0), then all requests will cause 
     BALANCER to "ask" RounRobin to choose a site (from DOIT.n.!SITES)

     Background:
        Each such "ask" involves placing the n (of DOIT.n.!SEL) on a queue,
        along with the "bayesian probabilities" (as set by ReWeighter, and
        read from the environment).  BALANCER then waits for about 1/2
        second; if a response is recieved by RoundRobin, BALANCER reads it's
        own queue for a the site (from DOIT.n.!SITES) to use.  If, after
        1/2 second, RoundRobin has not supplied a response, BALANCER will
        cease waiting, and use the "random" method.

      After some initializations,

  1) RoundRobin goes into an infinite loop  waiting on "asks" from BALANCER.

  1b) When a request arrives, BALANCER will send the entry number (n of
      the matching DOIT.n.!SEL) and the current bayesian weights (for this
      entry).

  1c)  RoundRobin forms a a vector of proportions; with each element
       having a value of:
               f(#_redirection) / current_bayesian weight
       The element from this vector with the lowest value is used as the
       selection (that is, if element 3 of a 4 element vector is lowest,
       then the "third" site is chosen).

  1d)  If multiple elements tie for lowest value, a bayesian-like
       rule is used (larger current_bayesian_weight selections being more 
       likely to be chosen.

  1e)  Note the use of f(#_redirections) as the numerator. In the simplest case,
       f() is simply a summation of all the times that BALANCER chose this 
       element (from this DOIT.n.!SITES list).  However, since ancient
       choices are probably irrelevant, f() is designed to degrade the
       influence of prior choices. 

       Specifically, the !ESTIMATE value (with a default of 1.0) is used as a
       measure of the "number of seconds of server time it will take to
       respond to this request). When a selection is chosen, this !ESTIMATE 
       is stored; along with a time it was stored. 

       Hence, for each element in DOIT.n.!SITES list, f() equals:
         f()=SUM max[(!ESTIMATE + (current_time - stored_time)),0]
       with the SUM over all times this site (element) was chosen.

       That is, if element 3 has been chosen 51 times, SUM has 51 elements
       (typically, the vast majority of these will equal 0; a fact which
       allows for large computational shortcuts).

       This function means that ancient redirections are irrelevant.

       Actually, with a value of 1, the decay is quite fast. On not-busy
       sites, this means that the "bayesian-like" mechanism will be 
       frequently used. 

       If you really want a round-robin, with no "decay", set !ESTIMATE=0
       If you really-really want a round-robin, with no "decay" and no
       "weighting", set !ESTIMATE=0, do not specify DOIT.n.!WEIGHTS,
       and set the appropriate WEIGHT_DEF_TYPE=0 (static).

         NOTE: !ESTIMATE=0 does NOT mean immediate decay; it means "no decay".
               That is, stored_time is set to current_time when f() is computed;
               and "ancient" redirections are NOT irrelevant.

  1f) Using the "request specific" BALANCER queue, the chosen element
      is returned to BALANCER.  

  1g) RoundRobin stores the results (the value of !ESTIMATE, the
      number of the chosen element and the current time); and 
      jumps to step 1a.


VI.a.vi. ** The CPUloadC "daemon"

 1) CPUloadC.CMD, a REXX program, is a "daemon" that is meant to be run on 
   your subsidiary servers. It's purpose is to ascertain load information
   WITHOUT  hitting the server with a HEAD method request. It then transfers 
   this information, using TCP/IP  socket calls, to the PortListen thread
   running on the main site.

 1a) Depending on the value of MODE, CPUloadC will determine the 
     GoServe  average responseteime, the Goserve current number of clients,
     a measure of CPU utilization, or the required to count to 4000.

 1b) If the PORT on MAIN_SERVER (that is, the TYPE_2_PORT on your main site) 
      is accessible, CPUloadC will transmit two pieces of information
      (seperated by a space):
        the MYNAME variable, and the load measurement.
     
 1d) CPUloadC waits for WAIT_SECONDS seconds, and then gotes to step 1a.

 1e) Note that if the MAIN_SERVER is not accessible, CPUloadC does not
     crash -- it just keeps trying. Thus, you can start CPUloadC at anytime, 
     and when you finally fire up your main site, CPUloadC will then send 
     it's (latest) load information.  Similarly, if you shut down a 
     subsidiary-site (or just kill CPUloadC), the PortListen thread will not 
     crash, it just won't update the "posterior weight" for this subsidiary 
     site.
 
 1f) There is nothing sacrosant about CPUloadC -- if you have a better
     load-monitor, and it can issue the appropriate TCP/IP socket calls, 
     there is no strong reason not to use it -- you can check the CPUloadC code
     for the proper "message" syntax.
     
     
                        ------------------------

VI.b. Selecting a WEIGHT_TYPE (the method of load measurement).

The following outlines the relative advantages of each method.

In general, we urge experimentation; you should play with the "prior weights" 
and the "adjusters". Please be aware that you can use WEIGHT_TYPES. to specify 
different methods for different sites!

WEIGHT_TYPE=1 (using HEAD HTTP requests)
 General Advantages: Requires minor, or no, modifications to subsidiary 
                     site servers
 General Disadvantage: Imposes more work on subsidiary site servers.
     
   Using head_string=0
         ADVANTAGES:   Should work with practically any server, whether it's
                       a GoServe server, or an OS/2 server.
                       Returns a value that is probably a reasoanble measure 
                       of current load.
       DISADVANTAGES:  Creates more work for the server (and may clutter up
                       audit files).
                       Is relatively slow to gather.
                       Response time is function of distance (in web space)
                       from main-site to subsidiary site; this may not be an 
                       accurate measure of the distance from the 
                       subsidiary-site to the client.
     
   Using the default (/!PING?RESPONSETIME)
         ADVANTAGES:   Is a good measure of recent load history.
                       Imposes minimal work on the server.
        DISADVANTAGES: Requires SRE-http, or a server that can appropriately
                       respond to this HEAD method request string.
                       Requires some, but not much, tinkering with parameters 
                       on SRE-http servers. 
                       Average response time may be a poor measure of current                      
                       load.

   Using a custom head_string
        ADVANTAGES:    Highly flexible.
      DISADVANTAGES:   You've have to know how to tell the subsidiary server 
                       to return a particular kind of response to a
                       HEAD method request that uses the specfied head_string.


WEIGHT_TYPE=2 (using CPUloadC)
  General Advantage: Imposes very mild load on subsidiary site servers
  General Disadvantage: Requires OS/2 be running on the subsidiary site
                        server.
                        Requires that you can properly install and maintain 
                        CPUloadC.CMD.

    MODE=1 : GoServe   RESPONSETIME
        ADVANTAGES:    Fast, low-work, generally accurate measure.
        DISADVANTAGES: Requires a GoServe server, such as SRE-http, that can 
                       launch CPUloadC as a daemon (under the GoServe process).
                       Average response time may not be relevant to current
                       load.

    MODE=2 : GoServe  number of clients
        ADVANTAGES:    Fast, low-work, reasonably accurate measure.
        DISADVANTAGES: Requires a GoServe server that can launch CPUloadC as 
                       a daemon (under the GoServe process).
                       Number of clients may only be loosely correlated with 
                       load.

    MODE=3 : Time slice  utilization (ranging from 0 to about 3)
        ADVANTAGES:    A direct measure of CPU load.
        DISADVANTAGES: Requires the (free) RXU.DLL
                       Time-slice measurement (using QprocQuery) is a tepid 
                       measure of true performance.
                       Should probably be combined with an "experimentally"
                       derived adjuster.

    MODE=4 : Seconds required to count to 4000
        ADVANTAGES:    A simple, indirect measure of CPU load.
        DISADVANTAGES: Very primitive, may have nothing to do with 
                       "server" tasks.
                       Should probably be combined with an "experimentally"
                       derived adjuster.

                        ------------------------
VI.c.  Using <BASE href="URL"> elements.

The "redirection" algorithim adopted by BALANCER has one drawback -- 
"relative" links on documents residing on these subsidiary sites will be 
"relative" to the subsidiary site.  That is, once a client has been redirected 
to a subsidiary site, subsequent requests for documents may NOT be subject to 
load balancing.

There are two ways of obtaining this "main-site" first result:
use "forwarding", or make sure the subsidiary sites use links that "point back"
to the main-site. This section discusses how to use the <BASE href="URL">
element to easily specify "links that point back"; the next section 
discusses forwarding.

HTML Definition: BASE. 
  Syntax: <BASE href="URL">
  Description:  name of the file relative to which partially qualified 
                pathnames in URLs should be interpreted. If not otherwise 
                specified the URL containing the document being displayed is 
                used as the base.

Discussion:

  Suppose the main-site is a.b.net, with a subsidiary-site of
  wow.far.gov.  Suppose a request for dogs.html is recieved at a.b.net, and
  redirected to wow.far.gov.  Now suppose that wow.far.gov/dogs.html contains
  the following link:

      Need <a href="info/moreinfo.html"> more information</a>?

  When the client selects this "relative" link, her browser will assume that 
  "info/moredogs.html" is relative to the current url -- and will issue 
  a request for  "wow.far.gov/info/moredogs.html". 

  This will subvert your desire to maintain main-site control over all requests,
  (that is, you'ld like these subsidiary-sites to ONLY be reachable via a 
  BALANCER supplied redirection).  What you need to do is inform the client's 
  browser that the "base URL" should be the main-site (a.b.net), and not the 
  current URL (at wow.far.gov).  

  This can be done by including a 
     <BASE href="http://a.b.net/">
  element in the <HEAD> section of MOREDOGS.HTML. By including this element, most
  (but not all) browsers will know to treat relative URL's (such as "info/moredogs.html")
  as pointing to a.b.net, and not to wow.far.gov).

  The only drawback is that you need to include this <BASE href="http://a.b.net/">
  in all HTML documents on wow.far.gov.  Although some servers (such as SRE-http) can
  automatically do this (for selected documents); in most cases you'll have to do 
  it "by hand".


                        ------------------------

VI.d.  Using  Forwarding

Instead of redirecting, BALANCER can "forward" a request to a subsidiary
site, wait for results, and then re-transmit the results to the original 
client. Therefore, the client is completely unaware that his request has
been answered by a server on some other site (see the description of the
DOIT. variable for details on how to specify what to forward).

    Basically, BALANCER "emulates" a browser, and then sends the results
    back to the client. 

Forwarding has two primary advantages: the "relative URL" problems (discussed
above) are avoided, and certain requests do not "redirect" very well. In
particular, most browsers do NOT correctly redirect POST method requests.

    Therefore BALANCER will "forward" all POST method requests (you
    can disable this by setting the DO_POST "advanced users parameter").

The primary disadvantage of forwarding is obvious: compared to using a
"regular server", you double the traffic over your lines -- since each piece 
of information must be obtained from the subsidiary server, and then 
retransmitted to the client. In cases where your main-site's computer is
dedicated to BALANCER, the effect on total traffic is even more severe,
since BALANCER's redirection responses are quite short.
  
Another disadvantage is that the "emulation", although fairly thorough 
(i.e.; all the cookies and other request headers are sent), is not complete. 
In particular, the subsidiary-site will not see the client's IP address (it 
sees the address of the main-site). This may cause occassional problems, 
such as when IP addresses are used for access control.

Considering these drawbacks...

    Forwarding is NOT generally recommended. However, in cases where the 
    "processing to transmitted bytes" ratio is large (i.e.; when obtaining the
    output of a CPU intensive script) forwarding might be useful. 
    Alternatively, when you want to be sure that "relative URLS" are properly 
    interpreted, forwarding might be safest.

        


  
                =====================================================

VII. Glossary

Forwarding:
   As an alternative to "redirection", BALANCER can forward a request to a
   subsidiary site, wait for and obtain the results, and then re-transmit
   these results to the client. Forwarding avoids some problems with POST
   method requests and with resolution of relative URLS; but at the cost
   of greatly increasing traffic.

Host: 
   On a server that supports multiple IP addresses, the HOST             
   request header (which is part of the http request sent by a 
   browser) is used to identify which site the request is meant for.
   Note that some older browsers do NOT nclude a HOST request          
   header, a fact that complicates the use and configuration of  
   multiple-host (also known as multiple-homed) web servers.

Load monitoring: 
  Obtaining information on how busy the server on a (set of)
  subsidiary sites is. Load monitoring requires some means of measuring the 
  performance of a server -- either in terms of CPU utilization, or 
  http-request response time.

Redirect:
   As part of the http (WWW) protocol, web servers can instruct a client's web
   browser to "redirect" a request to some new URL; the browser automatically 
   issues a new request (as specified in the redirection instructions) to this 
   new URL.  Redirection is the principal strategy adopted by BALANCER (that is, 
   BALANCER usually requires action by the client's browser; it does not work
   at the "router" level).

Request Selector:   Also called the "selector".
    The request that a client's browser sends to the "main-site". This is
    the portion of a URL after the domain. 
    For example, clicking a link of:
         http://foobar.ag.gov/srehttp/balancer.zip
    will cause your browser to send a selector of 
        /srehttp/balancer.zip 
    to the 
        foobar.ag.gov
    web server running at the ip address of foobar.ag.gov.

Sites:
    Sites are WWW sites, as identified by an IP address.  BALANCER 
    distinguishes two types of sites, the "main-site" and "subsidiary-sites"
    Typically, the main-site is the publically advertised site -- it's the 
    "front end" of your web presence.  It's also the site that BALANCER.80 is
    the "server sofware" for.  Subsidiary sites are sites that backup the
    main-site. In many cases, this backup accounts for all the actual work done
    by the server -- the  main-site's sole job is to redirect requests to
    subsidiary sites.

    Note that a subsidiary site may handled by the same physical machine as the 
    web site -- either as a different IP address, or as a different port.

Stem Variable: 
   Stem variables are REXX's way of implementing a data structure. Stem 
   variables consist of a "stem", and several "tails"; with the
   tails seperated by periods. For example, in:
      DOIT.4.!SEL
   the "DOIT" is the stem, and 4 and !SEL are tails.

   In this documentation, several stem variable have the following structure:
       Name.entry.!field='element [element2] [.. elementK] '
   For example:
       DOIT.4.!SITES='A.B.NET  C.D.ORG '
   In the above example, the name is "DOIT", the entry_number is "4", and the
   field is "!SITES".  The value of this contains two elements; "A.B.NET"
   and "C.D.ORG".

Weights:
   Weights are used to construct a "bayesian probability" of choosing a site.
   There are two kinds of weights: prior weights, which are specifed in a 
   DOIT.n.!WEIGHTS field; and posterior weights, which are determined by 
   measuring the load on subsidiary sites.

                =====================================================

VIII. Disclaimer

  BALANCER was created by Daniel Hellerstein, with help from 
  Tim Stephens.  It's "use at your own risk" freeware -- we take
  no responsiblity for untoward effects of this program. That said,
  in our limited testing it has worked properly. Should you discover
  any problems, or have suggestions, please contact Daniel Hellerstein
  at danielh@econ.ag.gov

 Formal disclaimer:

  Copyright 1998 by Daniel Hellerstein. Permission to use this program
  for any purpose is hereby granted without fee, provided that
  the author's name not be used in advertising or publicity
  pertaining to distribution of the software without specific written
  prior permision.

 With some proviso, this includes the right to subset and reuse the code,
  with proper attribution  The provisos are  several fold:
   1)  Portions of the code are adapted from other authors' work
       (these are noted where appropriate); you'll need to contact these other
       authors for appropriate permissions.
   2)  We, the authors of BALANCER (and related software),and any potentially
       affiliated institutions, disclaim any and all liability for damages due 
       to the use, misuse, or failure of the product or subsets of the product.

  Furthermore you may also charge a reasonable re-distribution fee for
  BALANCER; with the understanding that this does not remove the
  work from the public domain and that the above provisos remain in effect.

    THIS SOFTWARE PACKAGE IS PROVIDED "AS IS" WITHOUT EXPRESS
    OR IMPLIED WARRANTY.
    THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE PACKAGE,
    INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
    IN NO  EVENT SHALL THE AUTHOR (Daniel Hellerstein) OR ANY PERSON OR
    INSTITUTION ASSOCIATED WITH THIS PRODUCT BE LIABLE FOR ANY
    SPECIAL,INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
    RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
    OF CONTRACT,NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR
    IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE PACKAGE.


   BALANCER was developed on the personal time of Daniel Hellerstein,
   and is not supported, approved, or in any way an official product
   of my employer (USDA/ERS).



-- End of document.