CheckLink

CheckLink will map out a web-tree whose root is a specific HTML document (the starter-URL). CheckLink can also be used to examine and traverse the URLs that comprise a web-tree, and to create a hierarchical index of the web tree..

From this page you can:


Create a Hierarchical Index of a Web-Tree

from the web-tree defined by the "linkage" file (enter a filename, with no path information)
Options:
  • Selector(s) to start index from (default=the web-tree's starter-URL)
  • Display URLS with the following (possibly wildcarded) mime types (default= text/html only)
  • Display off-site URLs (otherwise, only display on-site URLs)
  • Selectors (possibly wildcarded) to not expand (sub-tree is suppressed): (default= no suppression)
  • Selectors (possibly wildcarded) to drop from index: (default= no drops)
  • Display a given URL multiple times:
  • Remove earlier "high level" entries (do not use with "display where first found")
  • Display descriptions (if available)

Display of output: Display using Unordered List (<UL>) || Display using Table || Edit Mode

Examine and traverse a web-tree

using the "linkage" file (enter a filename, with no path information)
"Linkage" files are created when you create a web-tree (see the description below for details).

Create a web-tree

:
The starter-URL should either be a selector (relative to this site []); or a fully specified URL (including the http://).   
A web tree whose root is the starter-URL (it should be an HTML document) will be created by recursively checking for links (IMGs, Anchors, etc.)

More "create a web tree" options

The following options control the extent to which the web-tree is searched, and the appearance of the output.'
Option Description
Descriptive Name:
The descriptive name is simply used as a title. If you do not enter one, the starter-URL will be used to create a descriptive name.
Check off-site URLs
Do NOT check off-site URLs
CheckLink can attempt to verify the existence of resources residing off-site (where off-site means "with an IP address different then the starter-URL's IP address"). Or, you can suppress this option (off-site URLS will not be queried).
Read & process html documents that are:
If you select the under the starter-URL option, then only documents in (or under) the directory of the starter-URL will be processed for recursive links.
If you select only process the starter-URL, then only the starter-URL will be read & processed.
Example.
Create & save descriptions:
No
html documents
html and plain text documents
CheckLink can create & save short descriptions of html (text/html) and plain text (text/plain) documents.
Return results as one long document
Return results in a multi-part document
Return results in two seperate documents
CheckLink can return results in several fashions.
The simplest means is to first send run-time status information, and then send the results immediately following the status information (one long document).
Using a multi-part (or two seperate) document is visually more appealing -- the "results" part will overwrite the "status" portion (the status portion's main purpose is to prevent server time-outs!)
Exclusion list: To avoid invocation of addons, scripts, and other dynamic and otherwise complicated resources, CheckLink will compare the selector of each link against each word in the space delimited exclusion list. If any of these words match the selector (and you can use multiple * wildcards), then the link will not be checked.
Types of tables: This space delimited list of codes is used to specify what results should be reported. For each code in this list, two seperate tables (one for for IMaGes and one for Anchors) is created. Valid codes are OK NOSITE NOURL OFFSITE EXCLUDED ALL
(optional ) linkage file
Note: if you want to create a linkage file, enter a filename only -- do not include path information.

To avoid overwriting a pre-existing linkage file, include ? marks in the file name. For example: LFILE?? will cause unique names to be used, starting with LFILE01.

As well as creating tables that list the various URLs that comprise a web-tree, you can also use CheckLink to examine and traverse the web tree. That is, for each URL in the web-tree: CheckLink will retain "linkage" information -- including information on all text/html documents (in the web tree) that contain this URL. In addition, for text/html documents CheckLink will retain a list of all the links in the document.

In order to do this, you must create a "linkage" file. If you specify a linkage file, you can then use the ? links in the results tables, or you can invoke the "examine and traverse" option above.


Web Tree? Does that make sense?

Perhaps the use of the term "web-tree" is misleading -- it's more of a web-network, web-graph, or (dare we say it?) a web-web. The point is that a tree implies a bottom-to-top branching structure, with a clearly defined set of precedences. In contrast, a web site is defined by a network of links, with each node connecting to a wide variety of other nodes. Although most web-sites do have some sort of hierarchy (i.e.; there is usually one or several "home pages"), this is usually loosely defined, with lots of cross-cutting links.

Nevertheless, for reasons of brevity CheckLink uses the term "web-tree" to refer to "the network of resources, as refered to by URLs, that may be reached from a single starting point". Although this single-starting point (the "starter-URL") is really just a point of entry, one usually chooses a "starter-URL" that is somehow more fundamental -- say, a home page. Hence, this "starter-URL" is often refered to as the "root of the web-tree".


Descriptive Notes

Example of Only process html documents in starter-URL's directory
  • ---> If the starter-URL is /SAMPLES/FOOBAR.HTM
    then the base-url  is /SAMPLES/
  • --->If /SAMPLES/FOOBAR.HTM contains links to /SAMPLES/TURKEY.HTM and /USERS/DOG.HTM,
    Then:
      * /SAMPLES/TURKEY.HTM will be processed (it will be read, Anchors and Images will be extracted, etc.)
      * /USERS/DOG.HTM will not be processed (however, CheckLink will check that it exists).

  • back to form

    Creating Descriptions
    Html descriptions are either pulled from a DESCRIPTION header (in the <HEAD> section of the html document), or generated from <Hn> elements. Plain text descriptions are the first few hundred characters of the document.
    Note: Descriptions are made only for documents that are "on-site" (and in the starter-URL directory, it you checked the above).

    back to form

    Multi-part, and two seperate, documents
    Using two seperate documents is similar to using a multi-part document, but avoids certain "over refresh" problems of certain browsers (i.e.; Netscape 2.x). However, use of "two seperate documents" does require storage of a semi-permanent output file on this server.

    Note that to use a multiple-part document you must have a browser that supports Connection:maintain (such as Netscape 2.0 and above). If you select "multi-parts" but your browser does not support Connection:maintain, then "two seperate documents" will be returned.

    back to form

    Types of tables
    The following tables (codes) can be requested (in any combination):
  •    OK) Display succesfully found links
  •    NOSITE) Display links to unreachable sites
  •    NOURL) Display links to missing resources
  •    OFFSITE) Display links to off-site URLs
  •    EXCLUDED) Display links to excluded URLs
  • or ...    ALL) Display all links
  • In general:
     *  The NOURL links are the most interesting (they should be reachable, but aren't).
     * If you are not "checking off-site links": you should not display the NOSITE links, but you should display OFFSITE links.

    back to form