SRE-DATA: A Database Addon for SRE-http
SRE-DATA is a simple, multi-threaded
web-accessible database
addon for SRE-http. With SRE-DATA, you can take
pre-existing flat databases, and use your own CGI-BIN
scripts (or SRE-http addons) to quickly look up
information; and with only minimal setup required.
Features of SRE-DATA
- Runs as a thread, or several seperate threads for
several seperate views of several different databases.
- Data retrieval
on moderate (say, 100k) databases takes a fraction of a
second.
- Wildcard searches can be taken
against selected variables (that you define on your data).
- Works with ASCII text, or fixed-record binary, data.
- Text data can have:
- Searches can be numeric, text based (with or without
case sensitivity), or exact (binary match).
- Can be automatically started when you start GoServe/SRE-http.
- A straightforward "macrospace" procedure is provided to facilitate
programmer's access to SRE-DATA (you can even access SRE-Data independently
of GoServe/SRE-http).
Table of Contents
Installation Notes
The easiest way to SRE_Data is to:
- UNZIP SREDATA.ZIP to an empty, temporary directory
- Run the INSTALL.CMD program (from an OS/2 command prompt).
Or, if you like to do things yourself, after UNZIPping ...
- Copy SREDATAX.CMD,
SREDATA.CMD, SREDATA.RXX,QSORT.EXE, and SREDATA.SRF to your GoServe
working directory (i.e.; D:\GOSERVE)
- Copy SREDATA.HTM (this file!) and DATADEMO.HTM to one of
your WWW directories.
- If you want to play with the demo files:
- Copy DATADEMO.CMD to your SRE-http ADDON directory.
- Create a SREDATA directory under your GoServe working directory,
- Copy INITFILT.IN, FIPSNAME.IN, and FIPSNAME.DAT to this SREDATA
directory.
- Copy DEM1SRED.CMD and DEM2SRED.CMD to your GoServe working directory.
Back to top
Launching SRE-Data
New user? You might want to
first read the Outline of SRE-Data.
To use SRE-DATA, you have to run SREDATA.CMD, the SRE-Data daemon
launcher. Although you can use the
the SRE-Data launcher at the end of this document, serious users
will probably want to use one of the following more formal mechanisms:
- CUSTOM_INITS: You can use CUSTOM_INITS, a parameter in the SREFILTR.80 file,
to launch an SRE-Data daemon when GoServe/SRE-http starts up (you'll need to edit SREFILTR.80 to change the
value of CUSTOM_INITS).
- Seperate session: (recommended) Run SREDATA.CMD from a seperate OS/2 session.
- External procedure: You can launch an SRE-Data by calling SREDATA
as an external procedure (the DATADEMO.CMD addon, which is used by the
launcher facility, contains an example of how to do this).
When called as an external procedure, SREDATA will return a 1 if the
daemon was launched, and a 0 if an error was detected (you can use
PMPRINTF to see what errors occurred). Note that a status of 1 does
not guarantee a successful launch, since other errors may have
occured in the daemon code (in SREDATA.RXX). Again, keep an eye on PMPRINTF.
Regardless of how you do it you, can launch many daemons, one for
each of several "indicator variables" in several databases
An Implementation Note
If you expect to be starting and stopping
SRE-Data daemons fairly frequently, we recommend use of the "seperate
session method" -- REXX seems to be a bit flakey about shutting down semaphores,
so having the option of "killing the process" is the best guarantee of stability.
Description of SREDATA options
However called, SREDATA expects a few arguments: a Name, a Type,
and a file or string. As indicated in the above examples:
When called with CUSTOM_INITS, the syntax is:
SREDATA NAME=some_name type=filename
When called from the command line, the syntax is:
SREDATA NAME=some_name type=filename MONITOR=0/1
When called as a function, the syntax is:
status=SREDATA('some_name',type,filename)
or
status=SREDATA('some_name,'STRING',astring)
where:
some_name : An arbitrary name (chosen by you)
type: One of PROG,DATA, STEM, KEY, LOADKEY, STRING
filename: A file name containing data or instructions
astring: A string containing instructions
The following details the above options...
- type: type is one of: PROG, DATA, STEM, KEY, LOADKEY, or STRING.
PROG |
"program" for reading a data file. PROG files will contain
various options, and must contain a DATA statement (to
identify the data file)
|
DATA |
a data file (the file is read using defaults that a PROG
can override) |
STEM | a "stem" file |
KEY | a "key" file |
LOADKEY | a "key" file, to be loaded into memory |
STRING | a character string, that contains a PROGram.
|
|
Notes |
STRING can only be used when SREDATA is called as a procedure. |
The STEM, KEY, and LOADKEY types require files
that are produced by the SREDATAX program. |
- some_name: some_name is a "handle" you'll use to query the daemon
- filename:filename is the name of a datafile (if type is
KEY, STEM, or DATA) or a PROGram file (if type is PROG).
Note that relative file names are assumed to be relative to the GoServe working
directory (i.e.; d:\goserve)
- astring: string should be a string that contains a
list of PROGram instructions. That is, it should have the exact same
format as PROGram file; with "CRLFs" seperating each option. Note that the
string option is only used when SREDATA is called as a procedure.
- monitor: monitor, when set to 1,
tells SREDATA to start a simple daemon monitor. When set to 0 (or when
not specified), SREDATA will die after launching the daemon.
The major purpose of the monitor is to provide visual evidence that things are happening
(since closing the process that launched the daemon will also close the daemon).
monitor=1 should NEVER BE USED WITH CUSTOM_INITS
Technical Note
When called, SREDATA.CMD will do several things
- If SREF_FIND_INDEX does not exist in macrospace, it will
load it (using the SREDATA.SRF file). Note that SRE-http version
1.2L and above comes with SREF_FIND_INDEX preloaded into
macrospace;
for earlier verions, SRE-Data will load a local copy (contained
in SREDATA.SRF).
- It will then create a queue and a semaphore using the
NAME argument (or using a value of INDEX, if no NAME was specified).
- It will launch SREDATA.RXX as a thread, and give it
several arguments.
- If MONITOR=1 is not specified, it will quit.
Otherwise, a simple "monitoring" routine will be run.
In either case, SREDATA.RXX will continue (as a daemon).
Therefore:
The real work is done by SREDATA.RXX -- the job
of SREDATA.CMD is to launch the SREDATA.RXX as a daemon (in
it's own thread), and to make sure that SREF_FIND_INDEX
is in REXX macrospace.
Back to top
Outline of SRE-DATA
The basic purpose of SRE-Data is to provide a simple database engine
that can be accessed from the web -- either by using SRE-http addons, or
CGI-BIN scripts. The following outlines how SRE-Data provides this service.
SRE-Data Reads Flat Databases
SRE-Data can read flat databases -- data contained in text files
(with records defined on a per-line basis), or data in files built
with fixed-length records. That is, SRE-Data is not designed
to read .DBF or other such proprietary databases.
Each record (in a flat database) is expected to contain
two types of information:
- An identifier variable
- Other information variables
Typically, the identifier variable is the key against which you will search.
Examples of identifier variables include customer numbers, county identifiers, or
usernames. SRE-Data allows you to use arbitrarily long identifiers, that may
or may not be numeric
... in fact, they may contain binary (non-text) characters.
Whatever you use, they must be unique to a record.
The optional information variables contain any information; text,
numbers, or binary representations. Your database can contain any number
of information variables, though in practice SRE-Data is
designed to work with a relatively small number of information
variables.
In some cases you may not have any information variables. For example,
you may have a large set of data with a single index file; and the position of
an identifier in the index is used to access these other data. In such a case,
it can be a great efficiency to use SRE-Data to quickly find this position,
and then let some other set of procedures extract and otherwise manipulate the
data associated with this position .
SRE-Data has parsing ability
When using flat databases, it is necesssary to specify where each of the
various (identifier and information) variables may be found. SRE-Data offers
a fairly flexible mechanism for doing this. Variables can be extracted
by absolute position in a record, or by the "word" position (where words
can be defined by commas and spaces (or by any other user-defined delimiters).
You can even
mix these types: with some variables defined by character position, and some by
word position. In addition to within record flexibility, SRE-Data has some cross record
flexibility. In particular, text-file records can span multiple lines;
and may contain comment lines.
In order to implement this wide range of parsing options, SRE-Data uses
PROGram files. These files, which are
specified when you launch SRE-Data, are simple
text files that contain special instruction codes. Therefore, effective
use of SRE-Data requires that you learn how to
create these PROGram files.
SRE-Data uses daemons for fast access
The real strength of SRE-Data is it's responsiveness. This is achieved
by running special daemons whose sole job is to load data into RAM,
and wait on requests that access this data. This strategy yields
quick throughput, with the time required to find a record measured in
hundredth's of a second , even in databases containing thousands
of records.
This strategy has two drawbacks:
- You must launch a specific daemon for each database you wish to access.
As described above, to do this you'll need to create an
SRE-DATA PROG file, and run SREDATA.CMD.
- For large databases, memory requirements can become daunting.
Similarly, if you have dozens of seperate databases, each one will
require it's own daemon.
This cost, in terms of RAM and system-threads, is one reason
why you might want
to use SRE-Data as an "index searcher", and let some other set
of procedures retrieve more substantative data based on the position (in
the index) of the desired identifier. In recognition of this
style of operation, SRE-Data can be used with special
KEY files --
quickly searched files that are not loaded into RAM.
SRE-Data is easy to access
Once you've created your PROGram file, and launched the one-or-more
SRE-Data daemons, you'll want to access your data.
Typically, this requires creation of a REXX procedure containing
a CGI-Bin script, or SRE-http addon. This procedure should, given
data obtained from a web request (i.e.; a FORM), call a special
SRE-Data provided function: SREF_FIND_INDEX.
SREF_FIND_INDEX does all the dirty work -- all you need to do is specify
the "name" of the database, the "target" to search for, and some optional
parameters.
SRE-Data is self refreshing
The SRE-Data daemons monitor the filestamp of the data (or key, or stem)
file they were started with. When a change is detected, the SRE-Data
daemon will
reload the data. Therefore, your datasets can be somewhat dynamic.
This is
an especially useful trick when combined with STEM (or KEY) files --
since these can be re-created (perhaps on a different machine) without
absobing the daemon's time. A caution:
for rapidly changing datasets launched with a PROGram file, this update
strategy may lead to a thrashing daemon.
An alternative
For more dynamic, but smaller, sets of data -- take
a look at the
SRE-http "customization"
addon.
Back to top
Structure of a PROGram file
PROGram files detail the structure of data files. They are used
by SRE-DATA (when a PROG= options is used), and by SREDATAX (the key &
stem file generator).
In some cases, you may be able to run SRE-Data without specifying
a PROG file. However, for any kind of special formatting, PROG files are
required!
The options should appear on seperate lines. To include comment
lines, start with a semi-colon.
- DATA=filename.ext (required)
-
Name of the file that contains your data.
File is relative to the goserve working directory (i.e;D:\GOSERVE)
- DELIMS=char_list optional
-
A list of "word" delimiters. For space, use %20 (i.e.; use URL encoding
for odd characters).
Default value: DELIMS=%20, (space, and comma)
-
IN_OFFSET=nlines (or nrecords)
- # lines (if FIXED=0) or # of records (if FIXED>0) to offset when
reading the input file.
Lines (or records) before this output will be ignored.
Default value: IN_OFFSET=0
-
LINES=nlines
- # lines per record (default=1)
If you specify FIXED (see below), this is ignored.
Default value: LINES=1
- FIXED=nchars
- # characters per record (fixed length records).
If non-zero, LINES is ignored (FIXED overrides LINES)
Default value:FIXED=0 -- which means "use one record per line"
- INDEX_FORMAT: arg1 [arg2]
- Format for reading identifier variable. Should be specified using either:
where "nth" means "nth word" (using DELIMS to define words)
and "start length" means "length characters, starting at start character"
Default value: INDEX_FORMAT=1 -- use first word.
- TYPE =atype
- The type of matching to be done. atype can consist of any combination of
the following.
- atype='' (or not specified):verbatim.
- atype='I' : case insensitive
- atype='S' : strip spaces from both ends
- atype='N' : numeric match -- should be used by itself
Default value: ATYPE= -- verbatim search (i.e.; using a==b)
- COMMENT_CHARS=char_list
- Comment line signifiers. If a line starts with any one
of the COMMENT_CHARS, it is ignored (this also applies
to comments embedded in multi-line records). Leave this
blank to have no comment characters.
Default value: COMMENT_CHARS=;
- HEADER=header comment
- A header comment that desribes the data. Can be retrieved with the
!COMMENT special request
- VARIABLES=varname1 arg1 [arg2] ,
-
A comma delimited list of variables and locations.
Used to partition the record into "information" fields, with
a varname assigned to each of these partitions.
arg1 and arg2 are as defined for the INDEX_FORMAT
option: arg1 by itself is "nth-word", arg1 with arg2 is "start - length".
A special code, * by itself, means "use the entire record".
Note that you can mix formats (i.e.; use both nth-word and start-length
formats), and you can use the same characters (or words)
in multiple variables.
Example:
INDEX_FORMAT=* , COUNTY_NAME 1 , FIPS_CODE 20 5
VARIABLES can be searched over (slowly, since the search uses a
one-by-one comparision)
Default value: VARIABLES= * -- create a variable, with the
name of "*", that contains the entire record (including the value of the
identifier variable)
Sample PROGram file
; desciptive header
header=This is the FIPSNAME demo database for SRE-Data
;
data=sredata\fipsname.dat
; 20 characters per record
fixed=20
; first "word" is the identifier
index_format=1
; matches can be case insensitive, and will ignore leading & trailing spaces
; * means "allow wildcard searches on variables"
type=IS*
; define a name variable (and a "whole record" variable)
variable= * , name 2
; for other items (such as comment characters), use defaults
For workings examples, please see the .IN files packaged with SRE-DATA.
Not Using PROGram files
For simple cases, where all the above options can be used at their
default values, you can skip the use of a PROGram file.
In particular:
for text files, with spaces and commas uses as delimiters,
and where the first work is the identifier
you can use a DATA= option when you launch SRE-Data (rather then
use a PROG= option).
Using the STRING type
The STRING type is designed as a convenience to programmers. Basically,
the contents of the astring variable should be an indistinguishable
from a PROGram file. That is, the string should have one option per line,
each line is delimited by a CRLF (where CRLF='0d0a'x), and lines
beginning with a ; are treated as comments.
Example:
crlf='0d0a'x
astring="header=This is the FIPSNAME demo database for SRE-Data"crlf
astring=astring||'data=sredata\fipsname.dat'crlf
astring=astring||'fixed=20'crlf
astring=astring||'delim=,'crlf
astring=astring||'index_format=1'crlf
astring=astring||'type=IS*'crlf
astring=astring||'variable= * , name 2'
status=sredata('FIP','STRING',astring)
Say "Status of SRE-Data: " status
Back to top
Creating and Using STEM and KEY files
In addition to the PROG (and DATA) options,
SRE-Data supports two special launch options: STEM and KEY.
- STEM
- STEM is used to quickly load data, but is otherwise the
same as using PROG
- KEY
- KEY (and LOADKEY) are used to instruct SRE-Data to
use special KEY files -- index files whose use requires
minimal RAM. Although KEY files are somewhat slower to search, their
size can be unlimited. Their primary disadvantage is that they contain
no additional information -- they only contain the identifier
variable.
Experience has shown that for moderately
large datasests (say, 30,000 records with 5 byte identifiers), the
standard SRE-Data strategy (of loading the database into RAM) can
bog down the machine. In these cases, use of the KEY option
is highly recommended.
For moderately large datasets, some speed improvement can be obtained
by using the LOADKEY option -- it loads the key file into ram for slightly
faster searches. Note: for the same amount of information (i.e.; just
an identifier), KEY files are much smaller then STEM files,
hence the memory penalty is less.
Both of these options require special files (.STM and .KEY files,
respectively). These files are created with the stand-alone
SREDATAX.CMD program. SREDATAX.CMD is a
simple program that asks for the name of aPROGram file.
After reading the PROGram, SREDATAX
will then create either a KEY or STEM file.
There is one feature of SREDATAX that may require some caution: the
use of the QSORT.EXE external program. Since sorting large datasets
(say, 70,000 records) can be excruciatingly slow under REXX, SREDATAX
can optionally use a fast sort program: QSORT.EXE. Although QSORT.EXE is
designed to work with SREDATAX, there are some cases where it's use is
inappropriate. In particular:
then you should not use QSORT!
Hint: Running SREDATAX as a scheduled program
If your databases are dynamic, with changes occuring on a regular
basis, you can use SREDATAX to update the KEY of STEM files on a
regular basis. SRE-Data will detect changes to it's data file,
and reload (this is actually a good reason to use STEM files --
the time savings from the use of STEM files becomes more significant
if you are frequently updating your data).
To run SREDATAX as a scheduled program, you should specify the the following parameters:
SREDATAX P=program_file t=output_file
where t is K for KEY file, or S for STEM file.
Running SREDATAX ?
will give you more details.
You might also want to examine the SREFLOGS.DOC file for hints on using the
SRE-http "scheduling" option in conjuntion with SREDATAX.
Back to top
Using SREF_FIND_INDEX
When SREDATA starts, it first checks for the existence of the
SREF_FIND_INDEX procedure; and if not found, it uses SREDATA.SRF
to load it into REXX macrospace. Regardless of how SREF_FIND_INDEX
is loaded into macrospace, it can be called from any
REXX program, not just from an SRE-http addon.
SREF_FIND_INDEX makes it easy to search for matching records. The syntax is
results=sref_find_index(target,qname,[varname],[nthmatch],[location_only],[timeout])
Where:
- results
- a 2 part response: recnum,value (note the use of the comma
as as seperator).
Example: 315,9001,FAIRFIELD
The first part
is the "record # of the matching record", or a 0 if no
match, or less then 0 if an error. The second part is a
value of what was matched. If a * variable was specified,
this will be the value of the record; otherwise, it will
be the value of the identifier (which should equal the
value of the target).
When no value can be found, or when some
type of error occurs while looking for a value this second part
will be empty.
Note: When KEY (or LOADKEY) is being used, SRE-Data will return
not return the "value of the record". However, you can
look up a record number in the
original dataset to find the "value" of the record.
- target
- the string to search for
- qname
- the "name" used when sredata was invoked (by default, INDEX)
- varname
- [optional] if not specified, look at the identifier variable.
If specified, a "one-by-one" search will be conducted. This can be very
slow (in contrast, binary search is used when looking for identifier
variables). However, you can include a wildcard character (the *)
in the target.
You can also use varname to
obtain internal information on
the dataset, and to get information on a particular record
- nthmatch
- [optional] if the identifer variable does not have uniques values,
or if you are searching against a VARIABLE,
multiple matches may occur (an especially likely occurrence when
wildcards are used in VARIABLE searches). In such cases,
you can tell SRE-Data to return the "nth" match. For example, setting
nthmatch=3 means return the third match.
Notes on nthmatch:
-
In addition to numeric values, nthmatch understands a few special codes:
-
The default (nthmatch=0 or nthmatch="") is to return the first found
match. In VARNAME searches, this will be the first match in the file;
for identifier searches, this may occur anywhere in the "set" of matches.
- if nthmatch is greater then the number of matching values, a 0 (no
match) is returned.
- multiple matches on identifiers
- location_only
- [optional] do not return "value" (i.e.; just return the record #)
- timeout
- [optional] seconds to wait for answer from daemon. The default value
is 30.
Obtaining information about the dataset
As a convenience, you can use varname to ask the daemon for information
about itself. The following special requests are available:
?: Variable Info -- number of variables, list of names
!OBS: # observations
!TYPE: matching type
!COMMENT: Header comment (empty string if none was specified)
!DATE: creation date (useful if you are using a STEM or KEY file)
!DATAFILE:original file (value of DATA)
!PROGFILE: Original PROGram file
!STARTED: invoked as; will return PROG, DATA, KEY or STEM
There is one other useful feature: using a varname of #. When varname=#,
then the target should contain an absolute record number. If you've specified
* as one of your variables, the value of this record # (in the original
data file) will be returned.
Extracting information from an explicit record
As a convenience, you can use SRE-Data to extract a record from the
data set. This would typically be used after a call to SREF_FIND_INDEX
with a location_only=1 -- depending on whether you found a match, you
might want to get the value of the matching record.
To do this, simply set target equal to the desired record number, and
set varname equal to '#varname'. For example, varname='#*' will return the
value of the entire record (assuming that you included * in the VARIABLEs
option of the PROGram file).
Explicit record lookup is especially useful if you have occasion to
search against
several variables in your data. In such cases, you could use VARNAME searches,
but that could be very slow. Instead, creation of a primary, and several
secondary daemons (one for
each of the variables you might search against) is recommended. With
this strategy, it is best to specify variables only for the "primary"
identifier. When a match is found, you could then query this primary
identifier's daemon for the desired variable (using the record number
returned by the appropriate secondary daemon).
Example:
Suppose you have a 3 variable delimited dataset; each record contains
a name, an id, and a balance.
Suppose you've launced two daemons:
1) VAR1: The primary daemon, using ID as the identifier, and
specifying VARIABLES= * , ID 1 , NAME 2 , BALANCE 3
2) VAR2: 1 secondary daemon, with NAME as the identifier, and NO
VARIABLES specified
You could then (the follwing is not an exhaustive list!):
i) Search VAR1 for a matching ID, and return the value of the record.
results=sref_find_index('VAR1',an_id)
ii) Search VAR2 for a matching NAME, and then return the
matching record:
irec=sref_find_index('VAR2',a_name,,,1)
if irec>0 then
avalue=sref_find_index('VAR1',irec,'#*',,2)
iii) Search VAR2 for matching NAME, and return her balance
irec=sref_find_index('VAR2',a_name,,,1)
if irec>0 then
avalue=sref_find_index('VAR1',irec,'#BALANCE',,2)
Explicit Records and KEY files
When using KEY files, SRE-Data can extract the original record (using
the VARNAME=# variant of SREF_FIND_INDEX).
However, since KEY files are indices, a few extra steps are required. In
particular, SRE-Data will find the requested record in the original
data file -- given that the original data file is still available (and
in the same location it was when you ran SREDATAX).
Caution:When a "delimited" (one line per record) file
is to be searched, SRE-Data will not create multi-line records, nor
will it skip lines that begin with comment characters.
In general, be careful when using KEY files generated from data files
that contain comment lines or that contain multi-line records.
Demos
SRE-Data comes with several demonstration programs. These include
two stand alone programs, DEM1SRED.CMD and DEM2SRED.CMD;
and a small addon, DATADEMO.CMD and it's "invoking document",
DATADEMO.HTM.
Although simple, these all demonstrate the
uses of the SREF_FIND_INDEX procedure.
DEM1SRED.CMD (examine your INITFILT.80 file) and DEM2SRED.CMD
(examine U.S. county names) are designed to be run in stand-alone mode --
they do not work off of the web. For details on their useage,
run either from an OS/2 command prompt and read the on-line help.
DATADEMO.HTM, which calls DATADEMO.CMD, provides an example of a web-interface
to an SRE-data daemon. By default, it uses the U.S. county name database
used by DEM2SRED.CMD.
Back to top
Launch an SRE-Data daemon
You can use the following to launch or kill a SRE-Data daemon (it calls the
DATADEMO.CMD addon).
Kill a Daemon
Launch a Daemon