13 Jan 1998: BALANCER: A Dynamic Load Balancer for the GoServe Web Server Abstract: BALANCER is used to redirect http (WWW) requests recieved by a "main site" to a set of subsidiary sites. BALANCER is specificable: you can specify, on a request-specific basis, which sites to redirect to. BALANCER is dynamic; it gathers "load" information from subsidiary sites, and uses it to distribute load in an optimal fashion. ===================================================== Table of Contents: I. Introduction II. List of features III. Installation IV. BALANCER IV.a. Detailed Description of BALANCER parameters V. Using CPUloadC VI. Technical Notes VI.a. Outline VI.a.i Balancer VI.a..ii ReWeighter VI.a.iii HeadQuery VI.a.iv. PortListen VI.a.v. RoundRobin VI.a.vi. CPUloadC VI.b. Selecting a WEIGHT_TYPE VI.c. Using elements. VI.d. Using Forwarding VII. Glossary VIII. Disclaimer ===================================================== Oftentimes the best way to operate a heavily visited WWW site is by redirecting requests (originally directed to a "main site") to a set of "subsidiary sites". By distributing requests across multiple sites, this "load balancing" takes advantage of multiple processors, and can spread traffic over multiple lines. BALANCER is designed to provide this "load balancing" in a dynamic fashion. BALANCER uses continually updated, real time information on server loads to improve the "balance" of redirections. That is, a "subsidiary site" that is currently busy (say, due to several complicated database lookups) will get less additional traffic -- but once this busy'ness ceases, the amount of traffic it recieves will be readjusted back up. BALANCER requires the GoServe Internet Server -- BALANCER is a "filter" for GoServe. Although BALANCER can handle simple document requests, it's primary use is to redirect requests. That is, on many sites, BALANCER will redirect all requests to other sites. These other sites are often, but not necessarily GoServe sites (in fact, they need not be OS/2 servers). If they are GoServe sites, you can run special "client" software that uses socket calls to pass load information back to BALANCER. Although BALANCER may be best used to redirect all (or nearly all) requests, BALANCER can be used as a Pre-Filter for a small list of selectors (URI's). if a request is not in this set, BALANCER can either attempt to resolve it (using simple rules), or it can call one of the regular GoServe filters (such as SRE-http or GoHTTP). In other words, with some attention to setup details, the use of BALANCER will NEVER cause a loss of functionality. BALANCER consists of two components: the "main site" software, and "subsidiary site" (the client) software. Although you need NOT use the "subsidiary site" software, BALANCER works best when used in an "all GoServe" environment, with the "subsidiary site" software running on machines that are also running GoServe web servers. This documentation describes the installation, configuration, and algorithim of BALANCER. While we tried to keep it fairly simple, the reader is assumed to have some understanding of http and the WWW. Furthermore, several possibly obscure (possibly non-standard) terms are used throughout the documentation. It's a good idea to peruse the glossary before tackling the rest of the document! Lastly, BALANCER is free -- we do ask that you read the "it's not our liability" standard disclaimter at the bottom of this document. If you have questions, please contact Daniel Hellerstein (danielh@econ.ag.gov). Feel free to complain, suggest, and praise! ===================================================== II. BALANCER Features * Redirection lists can be specified on a selector (URI) specific basis * Optional forwarding of selected requests, with results re-sent to client * Instead of balaning, a regular GoServe filter can be called on a selector specific basis * Multi-home aware * Bayesian redirection methodology, or modified round-robin: in both cases, prior weights and on current-load information can be used. * Several methods of determining current-load information; some of which do not require any modifications to subsidiary sites * Multi-threaded load-tracking * Automatic detection of off-line subsidiary sites * Susidiary sites can come on and off line without need for formal notification * Simple documents (and graphic files) can be delivered by BALANCER. ===================================================== III. Installation The following assumes you have GoServe installed, and know it's basics. For more info on GoServe, please see http://www2.hursley.ibm.com/goserve. BALANCER has been tested under Warp 3.0 and WARP 4.0 -- it may not work properly with earlier versions of OS/2. BALANCER requires several "DLLs". These include REXXUTIL, RXSOCK, and REXXLIB. REXXUTIL and RXSOCK are packaged with OS/2, REXXLIB is commercial software. If you do NOT own a copy of REXXLIB (it's an good value at $25, from http://www.quercus-sys.com/rexxlib.htm), please contact danielh@econ.ag.gov for alternatives. Installation instructions: 1) UNZIP BALANCER.ZIP to an empty, temporary directory. The following files will be created: balancer.80 : The "main site" portion of BALANCER balance2.rxx : The "track site load and save results" thread balance3.rxx : The "use HEAD requests to obtain information from subsidiary sites" thread balance4.rxx : The "recieve information, via socket calls, from subsidiary sites" thread balance5.rxx : The "round-robin" accounting thread -- used to track where requests have been redirected to. cpuloadc.cmd : The "send load information, via socket calls, to main site" program (runs on subsidiary sites). 2) Copy balancer.80, balance2.rxx, balance3.rxx, balance4.rxx, and balance5.rxx to your "GoServe working directory" (for example, D:\GOSERVE.) 3) Set your GoServe Filter to be balancer.80. For example, use the Options-Filter tab on the GoServe desktop object. NOTES: * If you wish to run BALANCER on a port other then 80, just change the extension. For example, to run on port 8081: rename balancer.80 to balancer.8081 (you'll also have to tell GoServe to use port 8081) 4) Using your favorite text editor, edit BALANCER.80 (or whatever you may have renamed BALANCER.80 to). For BALANCER to work ... **** You MUST set several parameters in BALANCER.80 ***** The next section describes these parameters in detail. 5) To use several of the "load information" gathering options, you'll need to install CPUloadC.CMD on your subsidiary sites (see section V for details on CPUloadC.CMD) You are now ready to try it out. If you want to keep an eye on things, you should obtain PMPRINTF (from http://www2.hursley.ibm.com/goserve). IV. BALANCER parameters. BALANCER needs to be told where, and how often, to redirect (or forward) requests. This is done through the somewhat tedious (and sometimes dangerous) mechanism of changing parameters in the BALANCER.80 program file (given sufficient interest, we will add some kind of "configurator" program to the BALANCER package). Note: these parameters are NOT meant to be changed on-the-fly. In order for them to take effect, you should start and restart GoServe. The (alphabetical list of) user changeable parameters are: ALIASES. = A stem variable containing selector-to-file mapping rules. BAYESIAN = 1: use "bayesian" load balancing; 0: use "round robin" DEF_WEIGHT_TYPE= Default mode of load monitoring DOIT. = A stem variable containing selector-matching information FILTER_NAME = Name of a "regular" GoServe filter (optional) NICKANAME. = A stem variable containing HOST information NO_URL_FILE = File to use when request does not match a DOIT.n entry. TYPE_2_PORT = Port to use in conjunction with CPUloadC VERBOSE = How much status information to display WEIGHT_TYPES.= A stem variable specifying subsidiary-site-specific load-monitoring methods For most of these variables the default values will work. However, the DOIT. variable MUST be set. IV.a. Detailed Descriptions of BALANCER parameters (in alphabetical order) ALIASES. : Stem variable used to match requests to files on the "own" site You may want the main-site to handle some requests (such as requests for the home page, or other commonly requested small documents). In such cases, one must instruct BALANCER to NOT redirect (or forwared), the request; and to resolve the request just like a normal (albeit rudimentary) web server. This is done by specifying an element of $ in the appropriate DOIT.n.!SITES field (you can think of $ as being a special form of site). When the $ (the "own-site") is chosen by BALANCER, you can either: 1) you can call a "regular" GoServe filter (such as GoFilter.80). 2) have BALANCER attempt to resolve the request, Case 1 is accomplished by setting the FILTER_NAME parameter; see the discussion of FILTER_NAME for details. For Case 2, BALANCER will (by default) append the selector to the GoServe data directory. You can override this simple default by using ALIASES. BALANCER will check each of the ALIASES. entries to see what directory the request maps to -- the GoServe data directory is used only if no matching ALIASES. entry are found. The syntax of ALIASES. is: ALIASES.0 = # of aliases ALIASES.n=' target d:\path ' where n=1.. ALIASES.0 target : selector is compared to the target d:\path : target is replaced with the value of d:\path A request selector is (case insensitively) compared to the target. If this target "abbreviation matches" the selector, then the matching portion of the request is replaced by the d:\path. After conversion of / to \, the resulting file is transmitted to the client. When specifying target, be aware that this "replacement" is not particularly intelligent. That is, you must be sure to properly include \ and / characters (in the target and in the d:\path). BALANCER does do one modification -- a leading / is removed from target (and also from the request selector). Example: ALIASES.0=3 ALIASES.1='COWS/ e:\animals\cows\ ALIASES.2='DOGS/POODLES/ e:\pets\type3\ ALIASES.3='BIRD d:\avian ' Assuming the above ALIASES, and a GoServe Data Directory of D:\WWW A selector of ..... yields /COWS/JERSEY.HTML E:\ANIMALS\COWS\JERSEY.HTML /PIGS/SOW.JPG D:\WWW\PIGS\SOW.JPG /BIRD10/PIX/WINGS.AVI D:\AVIAN10\PIX\WINGS.AVI Note that ALIASES. are NOT host-sensitive (which may cause problems on multi-host sites). ------------------------ BAYESIAN: Select "bayesian" or "round robin" balancing methodology BALANCER can use one of two "balancing" methodologies: BAYESIAN=1 : a "bayesian" method that uses random choices across "bayesian" probabilities BAYESIAN=0 a "round robin" method that attempts to keep the number of requests requests equi-proportional; relative to the various weighting factors and the estimated processing load of each request. This is the "default" method -- it can be overridden (on a selector specific basis) by the value of DOIT.n.!BAYESIAN. The bayesian method involves fewer computations (it's memoryless in terms of where results have been redirected to), but can occassionaly yield odd results (such as sending a runs of requests to otherwise equally busy sites). The "round-robin" method avoids this by tracking what's been sent where, at the cost of greater computation/memory manipulation. Please see the technical appendix for further discussion of the relative merits of the bayesian and round-robin approaches. ------------------------ DEF_WEIGHT_TYPE : Default mode of load monitoring. BALANCER supports three forms of load monitoring -- what is referred to as the WEIGHT_TYPE. The DEF_WEIGHT_TYPE defines the default WEIGHT_TYPE. DEF_WEIGHT_TYPE=0 : Static. No load information is obtained. The "prior weights" (specified in the DOIT. stem variables) are used to redirect requests. DEF_WEIGHT_TYPE=1 : HEAD requests. BALANCER will use HTTP "HEAD" requests to obtain load information from subsidiary sites. DEF_WEIGHT_TYPE=2 : CPUloadC generated information -- BALANCER will listen to the TYPE_2_PORT port for load information, sent by CPUloadC.CMD from subsidiary sites. Example: DEF_WEIGHT_TYPE=1 Notes: * If DEF_WEIGHT_TYPE>0, then DEF_WEIGHT_TYPE can be overridden on a subsidiary-site-specific basis (by using the WEIGHT_TYPES. variables). If DEF_WEIGHT_TYPE=0, then "dynamic" load balancing WILL NOT BE ATTEMPTED --- although the "prior" weights will be used (for non-equal, random redirection). * For details on the "weight types", please see the technical notes at the end of this document. ------------------------ DOIT. : Stem variable containing "selector-specific" redirection information The core action of BALANCER is the matching of requests (sent to the main-site) request to one of the DOIT.n entries. This matching is based on "selector"; and on multiplle-homed sites, on the "host". The basic syntax of DOIT. is: DOIT.0 = # of entries DOIT.n.!SEL = The selector (it may contain * wildcard characters) DOIT.n.!HOST = The host this entry applies to -- it is used to "limit the scope" of the entry. DOIT.n.!SITES = A space delimited list of sites (URLS) DOIT.n.!WEIGHTS = A space delimited list of "prior" weights (optional) DOIT.n.!FORWARD = If specified, and equal to 1, "forwarding" is used instead of redirection DOIT.n.!BAYESIAN = 1: Use "bayesian weighted" random draws. 0: Use "bayesian weighted" round robin. DOIT.n.!ESTIMATE = Estimate of number of seconds required to respond to respond to this SELector. If 0, "infinite" lifespan (which implies a standard round-robin algorithim). where n=1... DOIT.0 When examining a DOIT.n entry, BALANCER will: 1) See if a DOIT.n.!HOST has been specified. If so: DOIT.n.!HOST is compared to the IP address of the main-site (as supplied by GoServe), and to the HOST: request header. If neither match, the entry is skipped. Note that if no DOIT.n.!HOST is specified, this step is automatically satisfied. That is, entries without a DOIT.n.!HOST field are applied to all requests. 2) Assuming step 1 is satisfied; the DOIT.n.!SEL is compared to the request selector. This comparision is a case-insensitive, and allows for wild-cards (one or more *) to appear in the DOIT.n.!SEL field. 3) If DOIT.n.!SEL "matches" the selector, BALANCER will select one of the elements in the DOIT.n.!SITES field. Each of these elements is a IP address that points to WWW site (and,optionally, a directory on this site). This decision will be based on whatever information is currently available about the load on these sites, and on the "prior weights" specified in the the DOIT.n.!WEIGHTS field. Note that the first "matching" entry is used -- there is no attempt to find a "best" match. Examples: DOIT.0=5 Five entries. DOIT.1.!SEL='/new/*' DOIT.1.!SITES='hissite.here.net hersite.there.com oursite.where.org ' This entry will "match" all requests that start with /NEW/ will the "prior weights" for each of these sites equals 1.0 DOIT.2.!SEL='/IMGS/*.GIF' DOIT.2.!SITES=' pictures.wow.net/guest $ ' DOIT.2.!WEIGHTS='1 5 ' This entry will match all requests that start with /IMGS/ and that end with .GIF. The $ "site" is a pointer to the "own" site -- BALANCER will NOT redirect or forward, it will directly return the appropriate file to the client. You can include "subdirectory" information on a site -- redirection will be relative to the subdirectory. Thus, the pictures.wow.net/guest entry means: "redirect to http://pictures.wow.net/guest/imgs/xxx.gif." Note that subdirectory information is NOT used when determining load. DOIT.3.!SEL='/ALTVIEW/* DOIT.3.!HOST='altwww.oursite.com' DOIT.3.!SITES='hissite.here.net' This is a "host" specific entry -- it is only used if the a HOST request header of "altwww.oursite.com" accompanies the client's request. Note that only 1 site is specified (hence, no prior weights are required). DOIT.4.!SEL='/CGI-BIN/BIG_JOB* DOIT.4.!SITES='bigsite.beer.net smallsite.wine.com ' DOIT.4.!WEIGHTS='1 1' DOIT.4.!FORWARD=1 DOIT.4.!BAYESIAN=0 DOIT.4.!ESTIMATE=2 For requests for the /CGI-BIN/BIG_JOB script, use "forwarding" instead of redirection. Forwarding consists of BALANCER using socket calls to forward the request, and all the cookies and other request headers, to the subsidiary site. The response by the subsidiary site will be sent back to BALANCER, and BALANCER will "forward" these back to the client. Please see the Technical Notes for details on "forwarding." Also note the use of !BAYESIAN=0 and !ESTIMATE=2; which states that a "round robin" balancing "selection" algorithim is to be used, with an "estimated" 2 seconds selector-response time. DOIT.5.!SEL='*' DOIT.5.!SITES='bigsite.beer.net smallsite.wine.com ' DOIT.5.!WEIGHTS='15 2 ' This is the "default" -- it is used if entries 1,2 and 3 are not matches. Assuming the above: A request for ..... will match entry # ... /NEW/GOO.1 1 /OLD/HELP.HTM 3 /IMGS/WOW.GIF 2 /IMGS/ZOO/TIGER.JPG 3 /IMGS/ZOO/CAMEL.GIF 2 /NEW/IMGS/WATER.GIF 1 DOIT. Notes: * If you only have 1 site (i.e.; the $ "own site" code), then weight information is ignored (as one would expect!) * the number of elements in each DOIT.n.!SITES must equal the number of elements in DOIT.n.!WEIGHTS. However, if you do NOT specify a DOIT.n.!WEIGHTS, an implicit value of 1 will be used (for each site listed in DOIT.n.!WEIGHTS). * If a DOIT.n.!HOST field is not specified, then the entry applies to all requests. * The first match is used. Thus, if a host-specific entry that matches a selector FOLLOWS a generic entry that also matches, then the host specific entry will NOT be used. Double thus, heavily wildcarded entries (such as DOIT.n.!SEL='*') should appear after more narrowly defined entries. * It's a good idea to have a "match all" (a * selector) entry as the last entry. * !ESTIMATE is only used when BAYESIAN=0 -- it modifies the round-robin selection criteria. Note that if !ESTIMATE is not specified, a default value of 1.0 is used. * Larger values of !WEIGHTS will INCREASE the probability of redirecting to a site. ------------------------ FILTER_NAME: Name of a "regular" GoServe filter. Instead of redirecting requests, there may be times (i.e.; as a default) when you're willing to let the "main site" responsd to the request. As mentioned in the discussion of ALIASES.; such responses (signified by the use of $ in a DOIT.n.!SITES list) can either be resolved by BALANCER directlyr, or BALANCER can call a "regular" GoServe filter and let it do the work. By "regular", we mean one of the stand-alone GoServe filters; such as SRE-http, GoHTTP, or GoFilter.80. To implement this latter case (calling a regular filter) you must set the FILTER_NAME variable. For example, to use SRE-http: FILTER_NAME='SREFILTR.80' Then, when a $ site is chosed, SREFILTR.80 will be called (with the standard GoServe arguments). The net effect is as if BALANCER were not there; the "regular" filter will respond as if it had been called directly. The only detriment is the time lost while BALANCER figures out that it shouldn't do anything! Otherwise, there is NO loss in functionality. There is one proviso: before calling FILTER_NAME, balancer will check it's ALIASES. Thus, for very simple requests (i.e.; image files or simple documents with no access controls), the use of ALIASES. can avoid the time penalty associated with calling the "regular" filter. Notes: * Advanced users note: you can speed things up a bit by loading the filter into macrospace, and setting FILTER_NAME equal to the "macrospace" name. ------------------------ NO_URL_FILE: What to send if no matching DOIT.n entry can be found. NO_URL_FILE should be the fully qualified name of an HTML response file. NO_URL_FILE is used when there is no matching DOIT. entry. If NO_URL_FILE=' ', a short, generic message is used. NOTE: If you use specify a doit.n.!sel='*' entry (as a "default"), then NO_URL_FILE will never be needed. Examples: NO_URL_FILE='' NO_URL_FILE='D:\GOSERVE\NOTFOUND.HTM' ------------------------ SLEEP_SECONDS: seconds to wait between "HEAD" (WEIGHT_TYPE=1) load requests SLEEP_SECONDS is used to control how frequently HEAD (WEIGHT_TYPE=1) requests are generated. Larger values mean less-up-to-date information, but less load on subsidiary sites. Note that SLEEP_SECONDS does not effect WEIGHT_TYPE=2 (CPUloadC generated) information. However, CPUloadC contains WAIT_SECONDS, which is an equivalent "intensity of measurment" parameter. Example: SLEEP_SECONDS=60 ------------------------ TYPE_2_PORT: Port used by mode=2 (CPUloadC) load monitoring TYPE_2_PORT should be a valid TCP/IP port number. It is the port used by CPUloadC.CMD (running on a subsidiary site) to communicate with the main site. Typically, a 4 digit value is used; such as 8083. If you change TYPE_2_PORT (say, it conflicts with some other TCP/IP aware software), You Must Make Sure That You Also Change The Value of the PORT Variable in CPUloadC.CMD (on all sites that are running CPUloadC.CMD)! Example: TYPE_2_PORT=8083 ------------------------ VERBOSE : The amount of status information to report VERBOSE can be equal to 0, 1, 2 or 3 0=none (errors only) 1=some 2=lots 3=too much Example: VERBOSE=2 Note that STATUS information is written to the PMPRINTF window. ------------------------ WEIGHT_TYPES. : A stem variable of site-specific load monitoring instructions Use the WEIGHT_TYPES to specify, on a subsidiary-site-specific basis, the method of CPU-load monitoring to use. ! If DEF_WEIGHT_TYPE=0, then WEIGHT_TYPES. are IGNORED ! Basic syntax: WEIGHT_TYPES.0 = # of WEIGHT_TYPES. entries WEIGHT_TYPES.n = site_name weight_type , head_string , adjuster where (note use of commas as delimiters): site_name = one of the sites used in the DOIT.n.!SITES variables. The site name should be IDENTICAL to it's specification in the DOIT.n.!SITES field. weight_type = a weight_type of 0,1, or 2 (same meaning as in the DEF_WEIGHT_TYPE variable) head_string = only used with weight_type=1 (optional) The "selector" sent with a HEAD request. If not specified, !PING?RESPONSETIME is sent. For more information on head_string, see the technical notes. adjuster = A site-specific normalization factor; it's used to adjust mesures, that use different weight_types, to a standard metric. If not specified, a default value of 1.0 is used. NOTE: Larger values of adjuster DECREASE the probability of redirecting to the site. Examples: WEIGHT_TYPES.0=3 WEIGHT_TYPES.1='hissite.here.net 2 , , 2.5 ' WEIGHT_TYPES.2='pictures.wow.net 1 , !STATUS WEIGHT_TYPES.3='oursite.where.org 0 ' Note: For $ (the "own site name"), a "weight_type" of 0 is always used. ===================================================== V. Using CPUloadC.CMD CPUloadC.CMD is a REXX program that is designed to communicate with BALANCER.80 CPUloadC.CMD, when running on a (set of) subsidiary sites, will use socket calls to transmit current load information to the main site. Since CPUloadC.CMD can cheapily gather and transmit this information, in many cases it provides a more efficient mechanism then the use of HEAD method requests. To use CPUloadC.CMD on a subsidiary site, the site's server machine must be must be running OS/2. CPUloadC works best when run as daemon in conjunction with a GoServe server; since it knows how to ask GoServe for load statistics. However, it can also be used in a "stand alone" mode; in which case GoServe need not be in use. To use CPUloadC.CMD, you need to copy it to the subsidiary server, and change a few parameters. After that, you can either run it in stand alone mode, or launch it as a "daemon" under the GoServe process. For example,the CUSTOM_INITS parameter of SRE-http can be used to launch daemons. Other GoServe servers may be easily modified to achieve the same result. Do note that "running under the GoServe process" implies that some GoServe thread launches CPUloadC.CMD (i.e.; by a call to the REXXTHREAD function of REXXLIB) -- so launched, CPUloadC can issue special GoServe commands that are not available if CPUloadC is running in it's own process. The following parameters are set in CPUloadC.CMD. You MUST correctly set the MYNAME and MAIN_SERVER parameters! Shorter, alphabetical, descriptions: MAIN_SERVER : IP address of the "main server" MODE : How to compute load -- 1,2,3 or 4 MYNAME : Name of this server (as expected by main_server) PORT : Port that main_server is listening to USESAY : If=1, then use SAY (not PMPRINTF) for status reports VERBOSE : Level of status reporting (0=little, 1=some, or 2=lot) WAIT_SECONDS : Frequency of update (in seconds} Longer, alphabetical, descriptions: MAIN_SERVER: the IP address of the "main site". You can use either numeric, or name, IP address. Example: MAIN_SERVER='www.oursite.net' MODE: Method of measuring load. Four values of MODE are available MODE=1 : Use the extract('responsetime') GoServe statistic (the average response time over the last 100 or so requests) MODE=2 : Use the extract('clients') GoServe statistic (the current number of active client). MODE=3 : Use QPROCSTATUS "time slice" monitor (requires RXU) MODE=4 : Real dumb default -- seconds required to count to 4000 To use MODE1 or MODE2 you MUST launch CPUloadC.CMD as a semi-permanent thread under GoServe (say, by using rexxlib's REXXTHREAD procedure). Note: SRE-http users can use the CUSTOM_INITS parameter to "launch" this program. To use MODE3, RXU must be in your LIBPATH. If the selected MODE can not be used, them MODE=4 is used as a default. Example: MODE=2 MYNAME: Name of this subsidiary site. This name must exactly match the name used (in the DOIT. and WEIGHT_TYPES. variables) in the main site's BALANCER.80 program. If you leave this blank, this server's ip address will be used. To reiterate: The crucial point is that MYNAME must match the name expected by the MAIN_SERVER. That is, if this subsidiary site is at FOO.BAR.NET, and a MAIN_SERVER (on your sub-domain) is expecting FOO, then you MUST set myname='FOO', not 'FOO.BAR.NET' (that is, no attempt is made to match/resolve IP addresses; nor match IP name to IP numbers). Examples: MYNAME='HERSITE.THERE.COM' MYNAME='' PORT: Port to use. PORT must match the value of the TYPE_2_PORT variable specified in BALANCER.80 (that is running on the main site). Example: PORT=8802 USESAY: Disable use of PMPRINTF If you do NOT have REXXLIB installed on a subsidiary site, set usesay=1 Example: usesay=1 VERBOSE: Status reporting. VERBOSE=0 : no status info, VERBOSE=1 : some, VERBOSE=2 : lots Example: VERBOSE=2 WAIT_SECONDS: Seonds to wait between "updates". Load measurements on this subsidiary-site (with transmittal of results to the MAIN_SERVER) will occure every WAIT_SECONDS seconds. Example:WAIT_SECONDS=60 ===================================================== VI. Technical Notes VI.a: Outline of BALANCER The following outlines BALANCER's logic. We start with the assumption that BALANCER.80 is installed in D:\GOSERVE and has been started. Note that the following outline is essentially correct, but is not meant to be a programmers guide (the actual implementation may chop things up in a different order). Since BALANCER.80 is multi-threaded, the following outline is organized in several pieces. These threads do "talk" to each other -- to make sense of things, you might need to flip back and forth between the various pieces. VI.a.i. ** The BALANCER.80 thread. 1) When the first request arrives, BALANCER.80 will initialize. 1.a) Initialization requires launcing of BALANCE2.RXX as a "semi-permanent" thread. This is known as the "ReWeighter" thread. NOTE: Unlike GoServe "request-specific" threads, the semi-permanent are alive for as long as GoServe is alive. 1.b) BALANCER.80 will create a queue and a semaphore, and use them to transfer the values of the DOIT. and WEIGHT_TYPES. to the ReWeighter thread. 1.c) BALANCER.80 will wait for the ReWeighter thread to signal that "it is safe to proceed". 2) After recieving this signal, and on all subsequent requests, BALANCER will match the "request selector" (and possibly HOST: and IP address) to a DOIT.n entry. 2a) If no match can be found, eitehr the NO_URL_FILE, or a short message, is sent to the client. BALANCER then exits. 2b) Using the number of the matching entry, a "vector of bayesian probabilities" is pulled from the OS/2 environment. This "vector" is placed there by the ReWeighter thread, and is based on a combination of "prior weight" information (as specified in the DOIT.n.!WEIGHT fields) and "posterior weights" determined by measuring the load on subsidiary sites. Note that each DOIT.n entry is associated with it's own "vector of bayesian probabilities" environment variable. 2c) Using this vector of bayesian probabilities, one of the sites listed in DOIT.n.!SITES is selected. This selection is either purely random, or based on a "round robin" methodology. 2c.i) If purely random (BAYESIAN=1), the actual site chosen (for redirection) will be random -- with sites having a higher bayesian probability being chosen more often. 2c.ii) If "round robin" (BAYESIAN=0), BALANCER will send the "bayesian weight" (and "which DOIT. entry") information to the "RoundRobin" thread (BALANCE5.RXX). The RoundRobin thread will then examine it's history list (created from prior instances of this step), and choose a selection so as to maintain a equal quantity of (weighted) redirections to each site. 2c.ii.1) If RoundRobin is too slow (about 1/2 second) in responding, BALANCER will stop waiting, and use a BAYESIAN methodology. However, BALANCER will send the results (which selection from the DOIT.n.!SITES list) to RoundRobin (so as to keep accounts accurate) 2d) After selecting a site, BALANCER either: 1) sends a "302" redirection response to the client. The client will then redirect the request to the URL that is included in this response (i.e.; it may include a modified directory, as well as a new IP address). 2) If the request was a POST method, or if !FORWARD=1, then a "proxy server" like forwarding will take place; with BALANCER waiting on results and retransmitting them to the client. BALANCER then exits. VI.a.ii. ** The ReWeighter Thread 1) Upon invocation, ReWeighter will read the DOIT. and WEIGHT_TYPES. information passed to it by BALANCER.80. 1a) The DOIT. entries are scanned, and a list of all unique "sites" is constructed (i.e.; "directory" information is not used when creating this list of sites). A default "posterior weight" (of 1.0) is assigned to each site. 1b) This list is then compared to the WEIGHT_TYPES. variable. If no match is found in WEIGHT_TYPES.: i) the DEF_WEIGHT_TYPE is used as the "method" of load monitoring. ii) A "normalization" adjuster of 1 is used Otherwise, the WEIGHT_TYPE and a ADJUSTER are read from the matching WEIGHT_TYPES. entry. 1c) ReWeighter will then launch a set of threads that do the actual monitoring: i) BALANCE4.RXX is launched (as the PortListen thread). It will open and monitor a TCP/IP port (using the value of TYPE_2_PORT). ii) For each site that is to be monitored using weight_type=1, a seperate instance of BALANCE3.RXX is launched (as a HeadQuery thread). iii) While not a "monitoring" thread, ReWeighter also launches the RoundRobin (BALANCE5.RXX) thread. 1d) ReWeighter then tells BALANCER that initialization is done; and then goes into an infinite loop. Every 5 or so seconds it will read a special queue for "load balance" information. Both the PortListen thread and the (possibly several) HeadQuery threads will write load information to this queue. 1e) The information read from the queue is used to update the "posterior weights". Note that the posterior weights are specific to a site. In contrast, prior weights are drawn from the DOIT.n.!WEIGHTS field -- they are specific to a selector-site (actually, to a selector-site/dir) pair. 1f) Each site is then quickly checked to see if it's site is still accepting requests. If a server is not accepting requests (i.e.; it is not running), it's posterior weight will be set to 0: and a 0 posterior weight means a 0 probability of being selected (see the next step). 1g) Using these "posterior weights", and the "prior weights" (that are specified in the DOIT.n.!WEIGHTS variable), a set of "bayesian probabilities" are computed: with each site in DOIT.n.!SITES assigned it's own probability. For DOIT.n, the The formula is: P_j * p_j prob_j = ------------- where i runs from 1 to J, Sum{ P_i*p_i } and where: prob_j = Probability of redirecting (or forwarding) to the j'th site listed in DOIT.n.!SITES J = Number of sites listed in DOIT.n.!SEL p_j = "prior weight" for site/dir j P_j = "posterior weight" for site j Thus, large values for a "prior weight" (or a posterior weight) increase the probability that a site will be used. 1h) These bayesian weights are the saved to the OS/2 environment, where BALANCER.80 can then quickly read them. 1i) ReWeighter waits for a few seconds, and the checks if there is anything in it's queue. If there is, go back to 1d. Note that it is NOT expected that a complete set of new load information is provided to ReWeighter on each loop -- typically, the load information for only a few sites will change "every few seconds". That is; posterior weights are used until new information is obtained. 1f) ReWeighter Notes: i) The use of threads is an effective means of isolating "bad" servers. At worst, a "hung" server will effectively end the flow of "load information", but information on all other sites will continue to be updated. ii) ReWeighter is sent "response time" information. Since long response times indicate an overloaded site, the inverse of response time is used as the posterior weight. iii) When ReWeighter is matching sites to WEIGHT_TYPES. entries, it will NOT attempt to "resolve" IP addresses. That is, you should have a seperate WEIGHT_TYPES. entry for each "variant" of a site name that you use (i.e.; a full IP name, a full IP number, or a sub-domain name). VI.a.iii. ** The HeadQuery thread(s) 1) Each HeadQuery thread is responsible for monitoring just one site. 1a) HeadQuery issues a HEAD method HTTP request to the site "it is responsible for monitoring". The "selector" for this HEAD request is determined by the head_string element in the WEIGHT_TYPES. variable that corresponds to this site. Basically, there are three classes of "selectors" that may be used: i) The default (used if head_string is not specified, or if a WEIGHT_TYPES. entry for this site was not specified): a selector of /!PING?RESPONSETIME is sent. When the SRE-http web server recieves such a request, it will query GoServe for the "average response time", and return it. ii) The head_string equals 0: a selector of / is sent. The total time it takes to respond to this selector is used. iii) A non-0, non-empty value of head_string is specified. this value of head_string is sent, as is. Note that types i and iii expect a very specific form of response from the server. In particular, the third word on the response line is used as the measure of load (i.e.; the average response time). If this third word is not a number, then method ii is used (total response time). Thus, if you are not running SRE-http, you should determine what HEAD request will yield such a response. If no such HEAD request exists (for the server running on a subsidiary site), you should set head_string=0, (or use a weight_type of 2). 1b) The load measurement is then "normalized"; the load measure is multiplied by the value of the adjuster element in the WEIGHT_TYPES. entry that corresponds to this site. By default, adjuster=1 -- no normallization occurs. However, if you know that the measures returned from a given site are systematically larger (or smaller) then otherwise similar sites (under the same load); you can "correct" this "bias" by suitably setting the adjuster. Note that the adjuster complements the "prior weights". In fact, you could use prior weights for all "normallizaton". However, since prior weights are sometimes used "as is" (such as when BALANCER first starts) ... ** it is recommended that the adjuster be used to "normalize" measures to some common grounds; and the "prior weights" be used to account for intrinsically faster servers. 1c) If HeadQuery has problems reaching the site, the "load" is set to 0, which causes a posterior weight of 0. 1d) The load information is placed on the ReWeighter's queue. 1e) HeadQuery (this instance) then waits for SLEEP_SECONDS seconds, and jumps to step 1a. VI.a.iv. ** The PortListen Thread 1) The PortListen thread is responsible for recieving load information passed "back to the main site" by instances of CPUloadC running on subsidiary sites. All sites running CPUloadC should transmit IP messages to a single port (the TYPE_2_PORT) that is monitored by the PortListen thread. 1a) As each message is recieved (on the TYPE_2_PORT), PORT_LISTEN will determine whether it is from one of the subsidiary sites that have a WEIGHT_TYPE of 2. If it is not, the message is discarded. 1b) If it is, it will be "adjusted" (following the same logic as in step 1b of the description of the HeadQuery threads) 1d) The load information is place on the ReWeighter's queue. 1e) PortListen then waits for the next message to appear, and then goes to step 1a. VI.a.v. ** The RoundRobin thread 1) When BAYESIAN=0 (or DOIT.n.!BAYESIAN=0), then all requests will cause BALANCER to "ask" RounRobin to choose a site (from DOIT.n.!SITES) Background: Each such "ask" involves placing the n (of DOIT.n.!SEL) on a queue, along with the "bayesian probabilities" (as set by ReWeighter, and read from the environment). BALANCER then waits for about 1/2 second; if a response is recieved by RoundRobin, BALANCER reads it's own queue for a the site (from DOIT.n.!SITES) to use. If, after 1/2 second, RoundRobin has not supplied a response, BALANCER will cease waiting, and use the "random" method. After some initializations, 1) RoundRobin goes into an infinite loop waiting on "asks" from BALANCER. 1b) When a request arrives, BALANCER will send the entry number (n of the matching DOIT.n.!SEL) and the current bayesian weights (for this entry). 1c) RoundRobin forms a a vector of proportions; with each element having a value of: f(#_redirection) / current_bayesian weight The element from this vector with the lowest value is used as the selection (that is, if element 3 of a 4 element vector is lowest, then the "third" site is chosen). 1d) If multiple elements tie for lowest value, a bayesian-like rule is used (larger current_bayesian_weight selections being more likely to be chosen. 1e) Note the use of f(#_redirections) as the numerator. In the simplest case, f() is simply a summation of all the times that BALANCER chose this element (from this DOIT.n.!SITES list). However, since ancient choices are probably irrelevant, f() is designed to degrade the influence of prior choices. Specifically, the !ESTIMATE value (with a default of 1.0) is used as a measure of the "number of seconds of server time it will take to respond to this request). When a selection is chosen, this !ESTIMATE is stored; along with a time it was stored. Hence, for each element in DOIT.n.!SITES list, f() equals: f()=SUM max[(!ESTIMATE + (current_time - stored_time)),0] with the SUM over all times this site (element) was chosen. That is, if element 3 has been chosen 51 times, SUM has 51 elements (typically, the vast majority of these will equal 0; a fact which allows for large computational shortcuts). This function means that ancient redirections are irrelevant. Actually, with a value of 1, the decay is quite fast. On not-busy sites, this means that the "bayesian-like" mechanism will be frequently used. If you really want a round-robin, with no "decay", set !ESTIMATE=0 If you really-really want a round-robin, with no "decay" and no "weighting", set !ESTIMATE=0, do not specify DOIT.n.!WEIGHTS, and set the appropriate WEIGHT_DEF_TYPE=0 (static). NOTE: !ESTIMATE=0 does NOT mean immediate decay; it means "no decay". That is, stored_time is set to current_time when f() is computed; and "ancient" redirections are NOT irrelevant. 1f) Using the "request specific" BALANCER queue, the chosen element is returned to BALANCER. 1g) RoundRobin stores the results (the value of !ESTIMATE, the number of the chosen element and the current time); and jumps to step 1a. VI.a.vi. ** The CPUloadC "daemon" 1) CPUloadC.CMD, a REXX program, is a "daemon" that is meant to be run on your subsidiary servers. It's purpose is to ascertain load information WITHOUT hitting the server with a HEAD method request. It then transfers this information, using TCP/IP socket calls, to the PortListen thread running on the main site. 1a) Depending on the value of MODE, CPUloadC will determine the GoServe average responseteime, the Goserve current number of clients, a measure of CPU utilization, or the required to count to 4000. 1b) If the PORT on MAIN_SERVER (that is, the TYPE_2_PORT on your main site) is accessible, CPUloadC will transmit two pieces of information (seperated by a space): the MYNAME variable, and the load measurement. 1d) CPUloadC waits for WAIT_SECONDS seconds, and then gotes to step 1a. 1e) Note that if the MAIN_SERVER is not accessible, CPUloadC does not crash -- it just keeps trying. Thus, you can start CPUloadC at anytime, and when you finally fire up your main site, CPUloadC will then send it's (latest) load information. Similarly, if you shut down a subsidiary-site (or just kill CPUloadC), the PortListen thread will not crash, it just won't update the "posterior weight" for this subsidiary site. 1f) There is nothing sacrosant about CPUloadC -- if you have a better load-monitor, and it can issue the appropriate TCP/IP socket calls, there is no strong reason not to use it -- you can check the CPUloadC code for the proper "message" syntax. ------------------------ VI.b. Selecting a WEIGHT_TYPE (the method of load measurement). The following outlines the relative advantages of each method. In general, we urge experimentation; you should play with the "prior weights" and the "adjusters". Please be aware that you can use WEIGHT_TYPES. to specify different methods for different sites! WEIGHT_TYPE=1 (using HEAD HTTP requests) General Advantages: Requires minor, or no, modifications to subsidiary site servers General Disadvantage: Imposes more work on subsidiary site servers. Using head_string=0 ADVANTAGES: Should work with practically any server, whether it's a GoServe server, or an OS/2 server. Returns a value that is probably a reasoanble measure of current load. DISADVANTAGES: Creates more work for the server (and may clutter up audit files). Is relatively slow to gather. Response time is function of distance (in web space) from main-site to subsidiary site; this may not be an accurate measure of the distance from the subsidiary-site to the client. Using the default (/!PING?RESPONSETIME) ADVANTAGES: Is a good measure of recent load history. Imposes minimal work on the server. DISADVANTAGES: Requires SRE-http, or a server that can appropriately respond to this HEAD method request string. Requires some, but not much, tinkering with parameters on SRE-http servers. Average response time may be a poor measure of current load. Using a custom head_string ADVANTAGES: Highly flexible. DISADVANTAGES: You've have to know how to tell the subsidiary server to return a particular kind of response to a HEAD method request that uses the specfied head_string. WEIGHT_TYPE=2 (using CPUloadC) General Advantage: Imposes very mild load on subsidiary site servers General Disadvantage: Requires OS/2 be running on the subsidiary site server. Requires that you can properly install and maintain CPUloadC.CMD. MODE=1 : GoServe RESPONSETIME ADVANTAGES: Fast, low-work, generally accurate measure. DISADVANTAGES: Requires a GoServe server, such as SRE-http, that can launch CPUloadC as a daemon (under the GoServe process). Average response time may not be relevant to current load. MODE=2 : GoServe number of clients ADVANTAGES: Fast, low-work, reasonably accurate measure. DISADVANTAGES: Requires a GoServe server that can launch CPUloadC as a daemon (under the GoServe process). Number of clients may only be loosely correlated with load. MODE=3 : Time slice utilization (ranging from 0 to about 3) ADVANTAGES: A direct measure of CPU load. DISADVANTAGES: Requires the (free) RXU.DLL Time-slice measurement (using QprocQuery) is a tepid measure of true performance. Should probably be combined with an "experimentally" derived adjuster. MODE=4 : Seconds required to count to 4000 ADVANTAGES: A simple, indirect measure of CPU load. DISADVANTAGES: Very primitive, may have nothing to do with "server" tasks. Should probably be combined with an "experimentally" derived adjuster. ------------------------ VI.c. Using elements. The "redirection" algorithim adopted by BALANCER has one drawback -- "relative" links on documents residing on these subsidiary sites will be "relative" to the subsidiary site. That is, once a client has been redirected to a subsidiary site, subsequent requests for documents may NOT be subject to load balancing. There are two ways of obtaining this "main-site" first result: use "forwarding", or make sure the subsidiary sites use links that "point back" to the main-site. This section discusses how to use the element to easily specify "links that point back"; the next section discusses forwarding. HTML Definition: BASE. Syntax: Description: name of the file relative to which partially qualified pathnames in URLs should be interpreted. If not otherwise specified the URL containing the document being displayed is used as the base. Discussion: Suppose the main-site is a.b.net, with a subsidiary-site of wow.far.gov. Suppose a request for dogs.html is recieved at a.b.net, and redirected to wow.far.gov. Now suppose that wow.far.gov/dogs.html contains the following link: Need more information? When the client selects this "relative" link, her browser will assume that "info/moredogs.html" is relative to the current url -- and will issue a request for "wow.far.gov/info/moredogs.html". This will subvert your desire to maintain main-site control over all requests, (that is, you'ld like these subsidiary-sites to ONLY be reachable via a BALANCER supplied redirection). What you need to do is inform the client's browser that the "base URL" should be the main-site (a.b.net), and not the current URL (at wow.far.gov). This can be done by including a element in the section of MOREDOGS.HTML. By including this element, most (but not all) browsers will know to treat relative URL's (such as "info/moredogs.html") as pointing to a.b.net, and not to wow.far.gov). The only drawback is that you need to include this in all HTML documents on wow.far.gov. Although some servers (such as SRE-http) can automatically do this (for selected documents); in most cases you'll have to do it "by hand". ------------------------ VI.d. Using Forwarding Instead of redirecting, BALANCER can "forward" a request to a subsidiary site, wait for results, and then re-transmit the results to the original client. Therefore, the client is completely unaware that his request has been answered by a server on some other site (see the description of the DOIT. variable for details on how to specify what to forward). Basically, BALANCER "emulates" a browser, and then sends the results back to the client. Forwarding has two primary advantages: the "relative URL" problems (discussed above) are avoided, and certain requests do not "redirect" very well. In particular, most browsers do NOT correctly redirect POST method requests. Therefore BALANCER will "forward" all POST method requests (you can disable this by setting the DO_POST "advanced users parameter"). The primary disadvantage of forwarding is obvious: compared to using a "regular server", you double the traffic over your lines -- since each piece of information must be obtained from the subsidiary server, and then retransmitted to the client. In cases where your main-site's computer is dedicated to BALANCER, the effect on total traffic is even more severe, since BALANCER's redirection responses are quite short. Another disadvantage is that the "emulation", although fairly thorough (i.e.; all the cookies and other request headers are sent), is not complete. In particular, the subsidiary-site will not see the client's IP address (it sees the address of the main-site). This may cause occassional problems, such as when IP addresses are used for access control. Considering these drawbacks... Forwarding is NOT generally recommended. However, in cases where the "processing to transmitted bytes" ratio is large (i.e.; when obtaining the output of a CPU intensive script) forwarding might be useful. Alternatively, when you want to be sure that "relative URLS" are properly interpreted, forwarding might be safest. ===================================================== VII. Glossary Forwarding: As an alternative to "redirection", BALANCER can forward a request to a subsidiary site, wait for and obtain the results, and then re-transmit these results to the client. Forwarding avoids some problems with POST method requests and with resolution of relative URLS; but at the cost of greatly increasing traffic. Host: On a server that supports multiple IP addresses, the HOST request header (which is part of the http request sent by a browser) is used to identify which site the request is meant for. Note that some older browsers do NOT nclude a HOST request header, a fact that complicates the use and configuration of multiple-host (also known as multiple-homed) web servers. Load monitoring: Obtaining information on how busy the server on a (set of) subsidiary sites is. Load monitoring requires some means of measuring the performance of a server -- either in terms of CPU utilization, or http-request response time. Redirect: As part of the http (WWW) protocol, web servers can instruct a client's web browser to "redirect" a request to some new URL; the browser automatically issues a new request (as specified in the redirection instructions) to this new URL. Redirection is the principal strategy adopted by BALANCER (that is, BALANCER usually requires action by the client's browser; it does not work at the "router" level). Request Selector: Also called the "selector". The request that a client's browser sends to the "main-site". This is the portion of a URL after the domain. For example, clicking a link of: http://foobar.ag.gov/srehttp/balancer.zip will cause your browser to send a selector of /srehttp/balancer.zip to the foobar.ag.gov web server running at the ip address of foobar.ag.gov. Sites: Sites are WWW sites, as identified by an IP address. BALANCER distinguishes two types of sites, the "main-site" and "subsidiary-sites" Typically, the main-site is the publically advertised site -- it's the "front end" of your web presence. It's also the site that BALANCER.80 is the "server sofware" for. Subsidiary sites are sites that backup the main-site. In many cases, this backup accounts for all the actual work done by the server -- the main-site's sole job is to redirect requests to subsidiary sites. Note that a subsidiary site may handled by the same physical machine as the web site -- either as a different IP address, or as a different port. Stem Variable: Stem variables are REXX's way of implementing a data structure. Stem variables consist of a "stem", and several "tails"; with the tails seperated by periods. For example, in: DOIT.4.!SEL the "DOIT" is the stem, and 4 and !SEL are tails. In this documentation, several stem variable have the following structure: Name.entry.!field='element [element2] [.. elementK] ' For example: DOIT.4.!SITES='A.B.NET C.D.ORG ' In the above example, the name is "DOIT", the entry_number is "4", and the field is "!SITES". The value of this contains two elements; "A.B.NET" and "C.D.ORG". Weights: Weights are used to construct a "bayesian probability" of choosing a site. There are two kinds of weights: prior weights, which are specifed in a DOIT.n.!WEIGHTS field; and posterior weights, which are determined by measuring the load on subsidiary sites. ===================================================== VIII. Disclaimer BALANCER was created by Daniel Hellerstein, with help from Tim Stephens. It's "use at your own risk" freeware -- we take no responsiblity for untoward effects of this program. That said, in our limited testing it has worked properly. Should you discover any problems, or have suggestions, please contact Daniel Hellerstein at danielh@econ.ag.gov Formal disclaimer: Copyright 1998 by Daniel Hellerstein. Permission to use this program for any purpose is hereby granted without fee, provided that the author's name not be used in advertising or publicity pertaining to distribution of the software without specific written prior permision. With some proviso, this includes the right to subset and reuse the code, with proper attribution The provisos are several fold: 1) Portions of the code are adapted from other authors' work (these are noted where appropriate); you'll need to contact these other authors for appropriate permissions. 2) We, the authors of BALANCER (and related software),and any potentially affiliated institutions, disclaim any and all liability for damages due to the use, misuse, or failure of the product or subsets of the product. Furthermore you may also charge a reasonable re-distribution fee for BALANCER; with the understanding that this does not remove the work from the public domain and that the above provisos remain in effect. THIS SOFTWARE PACKAGE IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE PACKAGE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR (Daniel Hellerstein) OR ANY PERSON OR INSTITUTION ASSOCIATED WITH THIS PRODUCT BE LIABLE FOR ANY SPECIAL,INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE PACKAGE. BALANCER was developed on the personal time of Daniel Hellerstein, and is not supported, approved, or in any way an official product of my employer (USDA/ERS). -- End of document.