Creating an outline for your site essentially takes three steps:
Create a Site Entry — You first create an entry for your site that will set up the basic information you need:
Site Title - is just what you want to label your site.
Domain Root - is the root-level URL you will use for your site; this does not have to be the actual root of a web domain. For example, if I want an outline of my entire website, I would enter http://www.gregcklotz.com/ as the root; every site for each individual class I've taught would be included in the site outline. However, if I want an outline of only my Web Design class, I would enter http://www.gregcklotz.com/English313/ as the root, and only pages contained within that "/English313/" folder would be included in the crawl. Basically, this indexing tool looks for the root in each URL on the page and only traverses pages that contain the root; otherwise it just save the individual links themselves, without traversing that page's links.
Start Page - only needs to be set if you want the first page of your site outline to be something other than the page that appears when you open the Domain Root URL in a browser.
Set the Crawl Parameters — The "Site Crawl Parameters" box on the right side of the page allows you to customize your crawl. Descriptions of each parameter are available by hovering the cursor over the related links. When you save these parameters, they are saved permanently to your account, so you don't have to reset them with every site crawl you create; but you can change them when you need to. Each crawl you perform will save a note with the site crawl that tells you what parameters were used for that crawl.
Execute your Site Crawl — Your site crawl can potentially take a while, depending on the size of your site, number of links on your pages, internet connection speed, etc. If your site crawl is interrupted for any reason, you can always resume the crawl; you will see a "Continue Site Crawl" button next to any site that still contains incomplete pages.
Using Your Results
Once your site outline is created, you can access an outline of your site on the Website Links page. The outline has collapsible branches so you can view different levels of your hierarchy. Each page and link has an HTTP Status Code to indicate its status, basically:
Successes — these pages essentially exist where they are supposed to
Warnings — these pages can be reached by the provided URL, but are often redirects to where the page actually exists, so you may want to update these links
Errors — these pages cannot be reached at the given URL for various reasons
The Dead Links page lists only Errors (and Warnings if you choose to show them) so you can find all the dead links on your site, and on what page they reside. You can use the Dead Links page to remove dead links from your site hierarchy if you choose to maintain an updated version of your site outline.
Continue - The server has received the request headers, and the client should proceed to send the request body.
Switching Protocols - The requester has asked the server to switch protocols.
Checkpoint - Used in the resumable requests proposal to resume aborted PUT or POST requests.
OK - The request is OK (this is the standard response for successful HTTP requests).
Created - The request has been fulfilled, and a new resource is created.
Accepted - The request has been accepted for processing, but the processing has not been completed.
Non-Authoritative Information - The request has been successfully processed, but is returning information that may be from another source.
No Content - The request has been successfully processed, but is not returning any content.
Reset Content - The request has been successfully processed, but is not returning any content, and requires that the requester reset the document view.
Partial Content - The server is delivering only part of the resource due to a range header sent by the client.
Multiple Choices - A link list. The user can select a link and go to that location. Maximum five addresses.
Moved Permanently - The requested page has moved to a new URL.
Found - The requested page has moved temporarily to a new URL.
See Other - The requested page can be found under a different URL.
Not Modified - Indicates the requested page has not been modified since last requested.
Switch Proxy - No longer used.
Temporary Redirect - The requested page has moved temporarily to a new URL.
Resume Incomplete - Used in the resumable requests proposal to resume aborted PUT or POST requests.
Bad Request - The request cannot be fulfilled due to bad syntax.
Unauthorized - The request was a legal request, but the server is refusing to respond to it. For use when authentication is possible but has failed or not yet been provided.
Payment Required - Reserved for future use.
Forbidden - The request was a legal request, but the server is refusing to respond to it.
Not Found - The requested page could not be found but may be available again in the future.
Method Not Allowed - A request was made of a page using a request method not supported by that page.
Not Acceptable - The server can only generate a response that is not accepted by the client.
Proxy Authentication Required - The client must first authenticate itself with the proxy.
Request Timeout - The server timed out waiting for the request.
Conflict - The request could not be completed because of a conflict in the request.
Gone - The requested page is no longer available.
Length Required - The "Content-Length" is not defined. The server will not accept the request without it.
Precondition Failed - The precondition given in the request evaluated to false by the server.
Request Entity Too Large - The server will not accept the request, because the request entity is too large.
Request-URI Too Long - The server will not accept the request, because the URL is too long. Occurs when you convert a POST request to a GET request with a long query information.
Unsupported Media Type - The server will not accept the request, because the media type is not supported.
Requested Range Not Satisfiable - The client has asked for a portion of the file, but the server cannot supply that portion.
Expectation Failed - The server cannot meet the requirements of the Expect request-header field.
Internal Server Error - A generic error message, given when no more specific message is suitable.
Not Implemented - The server either does not recognize the request method, or it lacks the ability to fulfill the request.
Bad Gateway - The server was acting as a gateway or proxy and received an invalid response from the upstream server.
Service Unavailable - The server is currently unavailable (overloaded or down).
Gateway Timeout - The server was acting as a gateway or proxy and did not receive a timely response from the upstream server.
HTTP Version Not Supported - The server does not support the HTTP protocol version used in the request.
Network Authentication Required - The client needs to authenticate to gain network access.