SRE-DATA: A Database Addon for SRE-http

SRE-DATA is a simple, multi-threaded web-accessible database addon for SRE-http. With SRE-DATA, you can take pre-existing flat databases, and use your own CGI-BIN scripts (or SRE-http addons) to quickly look up information; and with only minimal setup required.

Features of SRE-DATA

Table of Contents


Installation Notes

The easiest way to SRE_Data is to:
  1. UNZIP SREDATA.ZIP to an empty, temporary directory
  2. Run the INSTALL.CMD program (from an OS/2 command prompt).
Or, if you like to do things yourself, after UNZIPping ...
  1. Copy SREDATAX.CMD, SREDATA.CMD, SREDATA.RXX,QSORT.EXE, and SREDATA.SRF to your GoServe working directory (i.e.; D:\GOSERVE)
  2. Copy SREDATA.HTM (this file!) and DATADEMO.HTM to one of your WWW directories.
  3. If you want to play with the demo files:
Back to top

Launching SRE-Data

New user? You might want to first read the Outline of SRE-Data.
To use SRE-DATA, you have to run SREDATA.CMD, the SRE-Data daemon launcher. Although you can use the the SRE-Data launcher at the end of this document, serious users will probably want to use one of the following more formal mechanisms:
  1. CUSTOM_INITS: You can use CUSTOM_INITS, a parameter in the SREFILTR.80 file, to launch an SRE-Data daemon when GoServe/SRE-http starts up (you'll need to edit SREFILTR.80 to change the value of CUSTOM_INITS).
  2. Advantages: Automated
  3. Disadvantages: Somewhat more difficult to manage; and to start new SRE-Data daemons with CUSTOM_INITS, you have to restart GoServe.
  4. Examples:
    CUSTOM_INITS='SREDATA NAME=FIP PROG=SREDATA\FIPNAME.IN ;  SREDATA NAME=FIP key=DAT1\FIP.Key"
    CUSTOM_INITS=' PRELOAD ; SREDATA Name=INIT1 STEM=SMP\INIT1.stm ; " 
  5. Seperate session: (recommended) Run SREDATA.CMD from a seperate OS/2 session.
  6. Advantages It's easy to start and stop SRE-Data daemons on an as-needed basis; you can also establish a very simple daemon monitor.
  7. Disadvantages: Each daemon requires a seperate process, which must be kept open (closing the process kills the daemon)
  8. Example: D:\GOSERVE>sredata name=fip prog=sredata\fipaname.in monitor=1

  9. External procedure: You can launch an SRE-Data by calling SREDATA as an external procedure (the DATADEMO.CMD addon, which is used by the launcher facility, contains an example of how to do this).
  10. Advantages: Fairly easy to start and stop SRE-Data daemons
  11. Disadvantages: Monitoring can not be done. Although you can use DATADEMO, you might want to write your own "launcher".
  12. Example:status=sredata('FIP','PROG','SREDATA\FIPNAME.IN')
  13. When called as an external procedure, SREDATA will return a 1 if the daemon was launched, and a 0 if an error was detected (you can use PMPRINTF to see what errors occurred). Note that a status of 1 does not guarantee a successful launch, since other errors may have occured in the daemon code (in SREDATA.RXX). Again, keep an eye on PMPRINTF.
Regardless of how you do it you, can launch many daemons, one for each of several "indicator variables" in several databases

An Implementation Note

If you expect to be starting and stopping SRE-Data daemons fairly frequently, we recommend use of the "seperate session method" -- REXX seems to be a bit flakey about shutting down semaphores, so having the option of "killing the process" is the best guarantee of stability.

Description of SREDATA options

However called, SREDATA expects a few arguments: a Name, a Type, and a file or string. As indicated in the above examples:
When called with CUSTOM_INITS, the syntax is:
     SREDATA  NAME=some_name  type=filename
When called  from the command line, the syntax is:
     SREDATA  NAME=some_name  type=filename MONITOR=0/1
When called as a function, the syntax is:
     status=SREDATA('some_name',type,filename)
                or
    status=SREDATA('some_name,'STRING',astring)
where:
    some_name : An arbitrary name (chosen by you)
          type: One of PROG,DATA, STEM, KEY, LOADKEY, STRING
      filename: A file name containing data or instructions
       astring: A string containing instructions 
The following details the above options...

Technical Note

When called, SREDATA.CMD will do several things
  1. If SREF_FIND_INDEX does not exist in macrospace, it will load it (using the SREDATA.SRF file). Note that SRE-http version 1.2L and above comes with SREF_FIND_INDEX preloaded into macrospace; for earlier verions, SRE-Data will load a local copy (contained in SREDATA.SRF).
  2. It will then create a queue and a semaphore using the NAME argument (or using a value of INDEX, if no NAME was specified).
  3. It will launch SREDATA.RXX as a thread, and give it several arguments.
  4. If MONITOR=1 is not specified, it will quit. Otherwise, a simple "monitoring" routine will be run. In either case, SREDATA.RXX will continue (as a daemon).
Therefore:
The real work is done by SREDATA.RXX -- the job of SREDATA.CMD is to launch the SREDATA.RXX as a daemon (in it's own thread), and to make sure that SREF_FIND_INDEX is in REXX macrospace.

Back to top

Outline of SRE-DATA

The basic purpose of SRE-Data is to provide a simple database engine that can be accessed from the web -- either by using SRE-http addons, or CGI-BIN scripts. The following outlines how SRE-Data provides this service.

SRE-Data Reads Flat Databases

SRE-Data can read flat databases -- data contained in text files (with records defined on a per-line basis), or data in files built with fixed-length records. That is, SRE-Data is not designed to read .DBF or other such proprietary databases.

Each record (in a flat database) is expected to contain two types of information:

  1. An identifier variable
  2. Other information variables
Typically, the identifier variable is the key against which you will search. Examples of identifier variables include customer numbers, county identifiers, or usernames. SRE-Data allows you to use arbitrarily long identifiers, that may or may not be numeric ... in fact, they may contain binary (non-text) characters. Whatever you use, they must be unique to a record.

The optional information variables contain any information; text, numbers, or binary representations. Your database can contain any number of information variables, though in practice SRE-Data is designed to work with a relatively small number of information variables.

In some cases you may not have any information variables. For example, you may have a large set of data with a single index file; and the position of an identifier in the index is used to access these other data. In such a case, it can be a great efficiency to use SRE-Data to quickly find this position, and then let some other set of procedures extract and otherwise manipulate the data associated with this position .

SRE-Data has parsing ability

When using flat databases, it is necesssary to specify where each of the various (identifier and information) variables may be found. SRE-Data offers a fairly flexible mechanism for doing this. Variables can be extracted by absolute position in a record, or by the "word" position (where words can be defined by commas and spaces (or by any other user-defined delimiters). You can even mix these types: with some variables defined by character position, and some by word position. In addition to within record flexibility, SRE-Data has some cross record flexibility. In particular, text-file records can span multiple lines; and may contain comment lines.

In order to implement this wide range of parsing options, SRE-Data uses PROGram files. These files, which are specified when you launch SRE-Data, are simple text files that contain special instruction codes. Therefore, effective use of SRE-Data requires that you learn how to create these PROGram files.

SRE-Data uses daemons for fast access

The real strength of SRE-Data is it's responsiveness. This is achieved by running special daemons whose sole job is to load data into RAM, and wait on requests that access this data. This strategy yields quick throughput, with the time required to find a record measured in hundredth's of a second , even in databases containing thousands of records.

This strategy has two drawbacks:

  1. You must launch a specific daemon for each database you wish to access. As described above, to do this you'll need to create an SRE-DATA PROG file, and run SREDATA.CMD.
  2. For large databases, memory requirements can become daunting. Similarly, if you have dozens of seperate databases, each one will require it's own daemon.

    This cost, in terms of RAM and system-threads, is one reason why you might want to use SRE-Data as an "index searcher", and let some other set of procedures retrieve more substantative data based on the position (in the index) of the desired identifier. In recognition of this style of operation, SRE-Data can be used with special KEY files -- quickly searched files that are not loaded into RAM.

SRE-Data is easy to access

Once you've created your PROGram file, and launched the one-or-more SRE-Data daemons, you'll want to access your data. Typically, this requires creation of a REXX procedure containing a CGI-Bin script, or SRE-http addon. This procedure should, given data obtained from a web request (i.e.; a FORM), call a special SRE-Data provided function: SREF_FIND_INDEX. SREF_FIND_INDEX does all the dirty work -- all you need to do is specify the "name" of the database, the "target" to search for, and some optional parameters.

SRE-Data is self refreshing

The SRE-Data daemons monitor the filestamp of the data (or key, or stem) file they were started with. When a change is detected, the SRE-Data daemon will reload the data. Therefore, your datasets can be somewhat dynamic. This is an especially useful trick when combined with STEM (or KEY) files -- since these can be re-created (perhaps on a different machine) without absobing the daemon's time. A caution: for rapidly changing datasets launched with a PROGram file, this update strategy may lead to a thrashing daemon.

An alternative

For more dynamic, but smaller, sets of data -- take a look at the SRE-http "customization" addon.

Back to top


Structure of a PROGram file

PROGram files detail the structure of data files. They are used by SRE-DATA (when a PROG= options is used), and by SREDATAX (the key & stem file generator).
In some cases, you may be able to run SRE-Data without specifying a PROG file. However, for any kind of special formatting, PROG files are required!
The options should appear on seperate lines. To include comment lines, start with a semi-colon.
DATA=filename.ext (required)
Name of the file that contains your data.
File is relative to the goserve working directory (i.e;D:\GOSERVE)

DELIMS=char_list optional
A list of "word" delimiters. For space, use %20 (i.e.; use URL encoding for odd characters).

Default value: DELIMS=%20, (space, and comma)

IN_OFFSET=nlines (or nrecords)
# lines (if FIXED=0) or # of records (if FIXED>0) to offset when reading the input file. Lines (or records) before this output will be ignored.

Default value: IN_OFFSET=0

LINES=nlines
# lines per record (default=1)
If you specify FIXED (see below), this is ignored.

Default value: LINES=1

FIXED=nchars
# characters per record (fixed length records).
If non-zero, LINES is ignored (FIXED overrides LINES)

Default value:FIXED=0 -- which means "use one record per line"

INDEX_FORMAT: arg1 [arg2]
Format for reading identifier variable. Should be specified using either:
  • INDEX_FORMAT=nth
  • INDEX_FORMAT=start length
  • where "nth" means "nth word" (using DELIMS to define words)
    and "start length" means "length characters, starting at start character"

    Default value: INDEX_FORMAT=1 -- use first word.

    TYPE =atype
    The type of matching to be done. atype can consist of any combination of the following. Default value: ATYPE= -- verbatim search (i.e.; using a==b)

    COMMENT_CHARS=char_list
    Comment line signifiers. If a line starts with any one of the COMMENT_CHARS, it is ignored (this also applies to comments embedded in multi-line records). Leave this blank to have no comment characters.

    Default value: COMMENT_CHARS=;

    HEADER=header comment
    A header comment that desribes the data. Can be retrieved with the !COMMENT special request

    VARIABLES=varname1 arg1 [arg2] ,
    A comma delimited list of variables and locations. Used to partition the record into "information" fields, with a varname assigned to each of these partitions. arg1 and arg2 are as defined for the INDEX_FORMAT option: arg1 by itself is "nth-word", arg1 with arg2 is "start - length". A special code, * by itself, means "use the entire record".

    Note that you can mix formats (i.e.; use both nth-word and start-length formats), and you can use the same characters (or words) in multiple variables.
    Example: INDEX_FORMAT=* , COUNTY_NAME 1 , FIPS_CODE 20 5

    VARIABLES can be searched over (slowly, since the search uses a one-by-one comparision)

    Default value: VARIABLES= * -- create a variable, with the name of "*", that contains the entire record (including the value of the identifier variable)

    Sample PROGram file

    ; desciptive header
    header=This is the FIPSNAME demo database for SRE-Data
    ;
    data=sredata\fipsname.dat
    ; 20 characters per record
    fixed=20
    ; first "word" is the identifier
    index_format=1
    ; matches can be case insensitive, and will ignore leading & trailing spaces
    ; * means "allow wildcard searches on variables"
    type=IS*
    ; define a name variable (and a "whole record" variable)
    variable= * , name 2
    ; for other items (such as comment characters), use defaults
    
    For workings examples, please see the .IN files packaged with SRE-DATA.

    Not Using PROGram files

    For simple cases, where all the above options can be used at their default values, you can skip the use of a PROGram file. In particular:
    for text files, with spaces and commas uses as delimiters, and where the first work is the identifier
    you can use a DATA= option when you launch SRE-Data (rather then use a PROG= option).

    Using the STRING type

    The STRING type is designed as a convenience to programmers. Basically, the contents of the astring variable should be an indistinguishable from a PROGram file. That is, the string should have one option per line, each line is delimited by a CRLF (where CRLF='0d0a'x), and lines beginning with a ; are treated as comments.
    Example:
        crlf='0d0a'x
        astring="header=This is the FIPSNAME demo database for SRE-Data"crlf
        astring=astring||'data=sredata\fipsname.dat'crlf
        astring=astring||'fixed=20'crlf
        astring=astring||'delim=,'crlf
        astring=astring||'index_format=1'crlf
        astring=astring||'type=IS*'crlf
        astring=astring||'variable= * , name 2'
        status=sredata('FIP','STRING',astring)
        Say "Status of SRE-Data: " status
    

    Back to top


    Creating and Using STEM and KEY files

    In addition to the PROG (and DATA) options, SRE-Data supports two special launch options: STEM and KEY.
    STEM
    STEM is used to quickly load data, but is otherwise the same as using PROG
    KEY
    KEY (and LOADKEY) are used to instruct SRE-Data to use special KEY files -- index files whose use requires minimal RAM. Although KEY files are somewhat slower to search, their size can be unlimited. Their primary disadvantage is that they contain no additional information -- they only contain the identifier variable.
    Experience has shown that for moderately large datasests (say, 30,000 records with 5 byte identifiers), the standard SRE-Data strategy (of loading the database into RAM) can bog down the machine. In these cases, use of the KEY option is highly recommended.

    For moderately large datasets, some speed improvement can be obtained by using the LOADKEY option -- it loads the key file into ram for slightly faster searches. Note: for the same amount of information (i.e.; just an identifier), KEY files are much smaller then STEM files, hence the memory penalty is less.

    Both of these options require special files (.STM and .KEY files, respectively). These files are created with the stand-alone SREDATAX.CMD program. SREDATAX.CMD is a simple program that asks for the name of aPROGram file. After reading the PROGram, SREDATAX will then create either a KEY or STEM file.

    There is one feature of SREDATAX that may require some caution: the use of the QSORT.EXE external program. Since sorting large datasets (say, 70,000 records) can be excruciatingly slow under REXX, SREDATAX can optionally use a fast sort program: QSORT.EXE. Although QSORT.EXE is designed to work with SREDATAX, there are some cases where it's use is inappropriate. In particular:

  • If you have multi-line records
  • If your records have non-default word delimiters (i.e.; you use characters other then spaces, tabs and commas as word delimiters)
  • then you should not use QSORT!

    Hint: Running SREDATAX as a scheduled program

    If your databases are dynamic, with changes occuring on a regular basis, you can use SREDATAX to update the KEY of STEM files on a regular basis. SRE-Data will detect changes to it's data file, and reload (this is actually a good reason to use STEM files -- the time savings from the use of STEM files becomes more significant if you are frequently updating your data).

    To run SREDATAX as a scheduled program, you should specify the the following parameters:

         SREDATAX P=program_file t=output_file 
    where t is K for KEY file, or S for STEM file.
    Running SREDATAX ? will give you more details.

    You might also want to examine the SREFLOGS.DOC file for hints on using the SRE-http "scheduling" option in conjuntion with SREDATAX.

    Back to top


    Using SREF_FIND_INDEX

    When SREDATA starts, it first checks for the existence of the SREF_FIND_INDEX procedure; and if not found, it uses SREDATA.SRF to load it into REXX macrospace. Regardless of how SREF_FIND_INDEX is loaded into macrospace, it can be called from any REXX program, not just from an SRE-http addon.

    SREF_FIND_INDEX makes it easy to search for matching records. The syntax is

         results=sref_find_index(target,qname,[varname],[nthmatch],[location_only],[timeout])
    Where:
    results
    a 2 part response: recnum,value (note the use of the comma as as seperator).
    Example: 315,9001,FAIRFIELD
    The first part is the "record # of the matching record", or a 0 if no match, or less then 0 if an error. The second part is a value of what was matched. If a * variable was specified, this will be the value of the record; otherwise, it will be the value of the identifier (which should equal the value of the target).

    When no value can be found, or when some type of error occurs while looking for a value this second part will be empty.

    Note: When KEY (or LOADKEY) is being used, SRE-Data will return not return the "value of the record". However, you can look up a record number in the original dataset to find the "value" of the record.

    target
    the string to search for

    qname
    the "name" used when sredata was invoked (by default, INDEX)

    varname
    [optional] if not specified, look at the identifier variable. If specified, a "one-by-one" search will be conducted. This can be very slow (in contrast, binary search is used when looking for identifier variables). However, you can include a wildcard character (the *) in the target.

    You can also use varname to obtain internal information on the dataset, and to get information on a particular record

    nthmatch
    [optional] if the identifer variable does not have uniques values, or if you are searching against a VARIABLE, multiple matches may occur (an especially likely occurrence when wildcards are used in VARIABLE searches). In such cases, you can tell SRE-Data to return the "nth" match. For example, setting nthmatch=3 means return the third match.
    Notes on nthmatch:

    location_only
    [optional] do not return "value" (i.e.; just return the record #)

    timeout
    [optional] seconds to wait for answer from daemon. The default value is 30.

    Obtaining information about the dataset

    As a convenience, you can use varname to ask the daemon for information about itself. The following special requests are available:
            ?: Variable Info -- number of variables, list of names
         !OBS: # observations             
       !TYPE: matching type 
     !COMMENT: Header comment (empty string if none was specified)
        !DATE: creation date (useful if you are using a STEM or KEY file)
    !DATAFILE:original file (value of DATA)
    !PROGFILE: Original PROGram file 
     !STARTED: invoked as; will return PROG, DATA, KEY or STEM
    
    There is one other useful feature: using a varname of #. When varname=#, then the target should contain an absolute record number. If you've specified * as one of your variables, the value of this record # (in the original data file) will be returned.

    Extracting information from an explicit record

    As a convenience, you can use SRE-Data to extract a record from the data set. This would typically be used after a call to SREF_FIND_INDEX with a location_only=1 -- depending on whether you found a match, you might want to get the value of the matching record.

    To do this, simply set target equal to the desired record number, and set varname equal to '#varname'. For example, varname='#*' will return the value of the entire record (assuming that you included * in the VARIABLEs option of the PROGram file).

    Explicit record lookup is especially useful if you have occasion to search against several variables in your data. In such cases, you could use VARNAME searches, but that could be very slow. Instead, creation of a primary, and several secondary daemons (one for each of the variables you might search against) is recommended. With this strategy, it is best to specify variables only for the "primary" identifier. When a match is found, you could then query this primary identifier's daemon for the desired variable (using the record number returned by the appropriate secondary daemon).

    Example:
       Suppose you have a 3 variable delimited dataset; each record contains
       a name, an id, and a  balance.
       Suppose you've launced two daemons:
            1) VAR1: The primary daemon, using ID as the identifier, and
                     specifying VARIABLES= * , ID 1 , NAME 2 , BALANCE 3
            2) VAR2: 1 secondary daemon, with NAME as the identifier, and NO
                    VARIABLES specified
       You could then (the follwing is not an exhaustive list!):
            i) Search VAR1 for a matching ID, and return the value of the record.
                results=sref_find_index('VAR1',an_id)
            ii) Search VAR2 for a matching NAME, and then return the 
                matching record:
                irec=sref_find_index('VAR2',a_name,,,1)  
               if irec>0 then 
                   avalue=sref_find_index('VAR1',irec,'#*',,2) 
           iii) Search VAR2 for matching NAME, and return her balance
                irec=sref_find_index('VAR2',a_name,,,1)  
               if irec>0 then 
                    avalue=sref_find_index('VAR1',irec,'#BALANCE',,2)
    

    Explicit Records and KEY files

    When using KEY files, SRE-Data can extract the original record (using the VARNAME=# variant of SREF_FIND_INDEX). However, since KEY files are indices, a few extra steps are required. In particular, SRE-Data will find the requested record in the original data file -- given that the original data file is still available (and in the same location it was when you ran SREDATAX).
    Caution:When a "delimited" (one line per record) file is to be searched, SRE-Data will not create multi-line records, nor will it skip lines that begin with comment characters.

    In general, be careful when using KEY files generated from data files that contain comment lines or that contain multi-line records.

    Demos

    SRE-Data comes with several demonstration programs. These include two stand alone programs, DEM1SRED.CMD and DEM2SRED.CMD; and a small addon, DATADEMO.CMD and it's "invoking document", DATADEMO.HTM. Although simple, these all demonstrate the uses of the SREF_FIND_INDEX procedure.

    DEM1SRED.CMD (examine your INITFILT.80 file) and DEM2SRED.CMD (examine U.S. county names) are designed to be run in stand-alone mode -- they do not work off of the web. For details on their useage, run either from an OS/2 command prompt and read the on-line help.

    DATADEMO.HTM, which calls DATADEMO.CMD, provides an example of a web-interface to an SRE-data daemon. By default, it uses the U.S. county name database used by DEM2SRED.CMD.

    Back to top


    Launch an SRE-Data daemon

    You can use the following to launch or kill a SRE-Data daemon (it calls the DATADEMO.CMD addon).

    Kill a Daemon

    Enter the NAME of the daemon to kill:

    Launch a Daemon


    Name to use:
    Type of input
    • Key file
    • KEY file, loaded into RAM
    • Stem file
    • PROGram file
    • DATA file (read using defaults)
    • STRING (of a PROGram file)
    Name of file
    or a PROGram string (if STRING type selected)
    Get basic stats from daemon (if your database is large, this may take a while)?
    After launching, you can query your database....

    Back to top