Searching files using DOSEARCH

The DOSEARCH addon for SRE-http is a generic search utility written in REXX. It searches one (or several) ASCII text files for "paragraphs" that satisfy a set of conditions that are specified in a "search string", and returns an HTML document containing these paragraphs.

DOSEARCH looks for "search strings" within "paragraphs", and uses "meta commands" and "target specific" instructions to structure the search.

By default, a paragraph is defined as being all text between blank lines. Alternatively, one can define paragraphs as single lines, or as delimited by any arbitrary character sequence.

A search string is comprised of "targets" There are two kinds of targets: subwords and phrases.
Each space delimited entry in the search string is treated as a seperate subword, except..
for phrases, which are delimited by parenthesis; for example: (xx yy zz)
.... and phrases must be matched precisely.

7.a.i) Search algorithims.

> DOSEARCH has two modes: SIMPLE and LOGICAL.
Simple mode with highlighting.
 Two meta-commands and 4 "target specific"  instructions are recognized.
        Meta-commands are signified by  *&  or *\ at the beginning
        of the search string.
             *&   means "find paragraphs that match ALL targets in
                the search string"
             *\  means "find paragraphs that match NONE of the targets in
                 the search string"
     If there are no meta-commands, the following target specific
     commands are recognized.
          &   means "paragraphs MUST have this target"
          |   means "accept paragraph if it has this target"
              Note that | is the default (assumed if no target specific
              command entered).
          \ means "paragraph must NOT have this target"
          % means "accept paragraph if it does NOT have this target"

    Summarizing: to be a "found" paragraph:
      Test 1a) Any (of several) | targets must be present,    or
           1b) All of the % targets must be missing
         If pass test 1a and 1b, then
           2a) None of the \ can be present, and
           2b) All of the & must be present
     If present, all & and | targets will be highlighted.
     Note that if there are no % targets specified, test 1b is ignored.
Logical expression mode without highlighting
The user enters REXX-like logical expression using the following operators:
  • & : AND operator
  • | : OR operator
  • \ : NOT operator
  • @ : XOR operator
  • ( ) : to group expressions.
  • A sequence of words without any operators is treated as a phrase -- to treat each word as a seperate subword, put ( ) around each one. Basically, when using this mode, be liberal in your use of ( ).

    DOSEARCH Options

    DOSEARCH supports a number of options. These options are easily set in DOSEARH.HTM, or by modifying the ALIASE(es) that invoke DOSEARCH.

    Technically speaking, the options appear in an option list, with each option seperated by an & character.....

    The option list should have the structure:
    option_name=option_value&option_Nam2=option_value2&...

    The DOSEARCH options are:

           DELIM : The paragraph delimiter.
                       " "  or 0 = blank lines   (the default)
                       "$"      = Each line is a paragraph
                       other     = User specified delimiter
           LINE  : Maximum number of lines to display of each "found" paragraph.
                     If 0, no lines displayed (a summary will be displayed)
                     Default is to display all lines.
           NUM   : YES=Display the line or paragraph number,
                   NO=Don't (default=YES)
           BAR   : YES= Seperate each paragraph/line by a horizontal bar,
                   NO=Don't (default=YES)
           EXPERT: YES= Use "logical expression mode",
                   NO=Use simple mode (Default=NO)
           FILE  :  FILE=filename
                    A file to search (either relative to the data directory,
                    or in a "local" virtual directory).  You can include as many
                    FILE options as desired (each entry will be searched in turn).
                    Furthermore, * and ? wildcards can be used.
           SEARCH: The search string
           CASE  : If YES, then search is case sensitive (default is NO)
    

    Examples

    Simple mode examples
    ( The best car)
    Finds paragraphs containing the phrase the best car
    car truck motorcycle
    Finds paragraphs that contain one (or more) of car, truck or motorcycle
    dog cat & store (pet pig)
    Finds paragraphs that contain one (or more) of dog, cat or the phrase pet pig; and that also contain store
    *& computer price memory
    Finds paragraphs that contain computer, price, and memory (must have all of them, but can be in any order)
    Logical mode examples
    The best car
    Finds paragraphs containing the phrase the best car
    car | truck | motorcycle
    Finds paragraphs that contain one (or more) of car, truck or motorcycle
    ( (dog & cat) | (pet pig) ) & stores
    Finds paragraphs that contain stores, and that contain either both dog and cat, or the phrase pet pig

    Using DOSEARCH.HTM

    TESTSRCH.HTM is an HTML document that directly calls DOSEARCH -- it does not require using an ALIAS or an <ISINDEX> element. It also provides an easy means for setting a number of DOSEARCH options. If you are familiar with HTML FORMS, you can customize (and rename) DOSEARCH.HTM. In particular, to facilitate searches (say, of a specific document) many of the TYPE fields can be set to be HIDDEN.

    Lastly, as described in the SRE-http manual, you can use DoSearch (in combination with an ALIAS) to implement an ISINDEX type of search.