HTML/CGI Talk
Search Script

A Demo

Type in a string to search this talk!

Enter a string (regular expressions accepted)
Don't forget to come back!

The Search PERL Code

#!/usr/local/bin/perl
#==========================================================================
# 
# Created:	28/7/95
# By:		Scott Penrose (scottp@pas.com.au)
# Purpose:	SEARCH Web Documents on local hard disk
#		(used for html/cgi talk for LUV 1/8.95)
#
# Future:	Search non local documents
#		Give directories from current direcotry of html file
#			- eg: directory = ".", or "/" = root of HTML
#
# Usage:	Install into STANDARD cgi directory
#		Call from HTML Form
#		Input	- String = Search string
#                         User entered Regular Expression
#			- Where  = Where to search
#                         What directory to search (ie: URL, no / at start)
#==========================================================================
# FIXED Variables - These may need to be chaned for local servers
# THE Root directory of your web server. Include / at the end!
$SS_RootDirectory = "/usr/web/docs/";
$SS_Log = "/usr/web/search.log";


# Set the DATE (silly method, but it works)
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$mon++;
$SS_DATE = $year*10000 + $mon*100 + $mday;


#########################
# GET Command ENVIRONMENT
#########################
# These are the standard variables passed to you with CGI
# -------------------------------------------------------
# AUTH_TYPE - Basic
#     Level ?
# GATEWAY_INTERFACE - CGI/1.1
#     CGI Version
# HTTP_ACCEPT - */*, image/gif, image/x-xbitmap, image/jpeg
#     Accepted types ?
# HTTP_REFERER - http://www.pas.com.au.:8001/cgi-bin
#     ?
# HTTP_USER_AGENT - Mozilla/1.1N (X11; I; Linux 1.1.90 i486)
#     Application calling script, my copy of Netscape on Linux
# QUERY_STRING - 
#     The string USUALLY used in a CGI Script
# REFERER_URL - http://www.pas.com.au.:8001/cgi-bin
#     The document/directory calling this script. ie: you can make sure it
#     is yours.
# REMOTE_ADDR - 203.8.9.18
#     Remote callers IP Number, a simple method of security if necessary
# REMOTE_HOST - scott.meriden.pas.com.au
#     And of course the remote callers address
# REMOTE_IDENT - 
#     Remote ident ?
# REMOTE_USER - scottp
#     Remote USER Name, if it is a protected directory
# REQUEST_METHOD - GET
#     Method called, get/put
# SCRIPT_NAME - /cgi-bin/showenv.cgi
#     Name of this script (and location)
# SERVER_NAME - jethro.meriden.pas.com.au
#     Full name of server running httpd
# SERVER_PORT - 8001
#     Port Number Used
# SERVER_PROTOCOL - HTTP/1.0
#     Server Protocol
# SERVER_SOFTWARE - CERN/3.0
#     Server Software
# ------------------------------------------------------------------------
# A BIG LIST, but very useful !
# 
# Out QUERY_STRING - We shall seperate to our valid variables 
$QUERY_STRING = $ENV{QUERY_STRING};
# Now split into associative array
# Split Query Environment String into a list, seperated by &
# eg: String=Something&Where=Somewhere
@QUERY_LIST = split( /&/, $QUERY_STRING);
# Split each list item into PARAM and VALUE into an array
foreach $item (@QUERY_LIST) {
    ($param, $value) = split( /=/, $item);
    # UNESCAPE
    # CGI makes all weird characters escape sequences. 
    # (I copied this line from the wwwlib for perl
    $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C",hex($1))/ge;
    $value =~ s/\+/ /;
    $QUERY_ARRAY{$param} = $value;
}
# Get other entries for LOG File
$REMOTE_HOST = $ENV{REMOTE_HOST};
$REMOTE_USER = $ENV{REMOTE_USER};
$SERVER_NAME = $ENV{SERVER_NAME};
$REMOTE_URL = $ENV{REMOTE_URL};
$HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT};


#######################
# HTML - Display HELLO!
#######################
# Don't forget you need to specify the TYPE of document.
# You do not need this with HTML files because HTTPD uses the extension of
# these files to specifiy the type, eg: HTML, GIF, JPEG etc.
print "Content-type: text/html\n\n";
print "\n\n";
# Standard HTML
# You need a \ in front of @, ", and some other keys
# Don't forget \n to make your source HTML file look clean
print "<HTML>\n";
print "<HEAD>\n<TITLE>SEARCH Results</TITLE>\n</HEAD>\n";
print "<BODY>\n";
print "<H1>SEARCH Results</H1>\n";
print "Documents found for <B>$QUERY_ARRAY{String}</B> in <B>/$QUERY_ARRAY{Where}</B>\n";
print "<HR>\n";


############################################################
# SEARCH - Search through the directories and produce a list
############################################################
# This method could be improved LOTS
# To save time I am using find and grep to do the search for me.
# from experience I have found grep to search a file quicker than perl
chdir "$SS_RootDirectory";
open (FIND, "find $QUERY_ARRAY{Where} -iname '*.htm*'|");
while ($file = <FIND>) {
    # if GREP returns a string, then it found our search string
    $file =~ chomp;
    open (TEMP, "grep -i \"$QUERY_ARRAY{String}\" $file|");
    $temp = <TEMP>;
    close (TEMP);
    if ($temp ne "") {
	# Great, now find the TITLE, assume it is on one line
        open (TEMP, "grep -i TITLE $file|");
	$temp = <TEMP>;
	close (TEMP);
	$temp =~ s/^.*<TITLE>//i;
	$temp =~ s/<\/TITLE>.*$//i;
	$FOUND{$file} = $temp;
    }
}


#####################################
# DISPLAY - Display the list of files
#####################################
foreach $file (keys %FOUND) {
    print "<H4><IMG SRC=\"/images/ButSmall/right.gif\">";
    print " <A href=\"/$file\"> $file </A></H4>\n";
    print "<BLOCKQUOTE>$FOUND{$file}</BLOCKQUOTE>\n";
}


################
# HTML - Goodbye
################
print "<HR>\n<P>\nPlease leave any messages/problems to <A href=\"mailto:sysadmin\@pas.com.au\">sysadmin\@pas.com.au</A>\n";
# DON'T forget to END Body and HTML
print "</BODY>\n";
print "</HTML>\n";


#############################################
# LOG - Finally insert the entry into the log
#############################################
open (LOG, ">>$SS_Log");
print LOG "$SS_DATE:";
print LOG "$REMOTE_HOST:";
print LOG "$REMOTE_USER:";
print LOG "$SERVER_NAME:";
print LOG "$REMOTE_URL:";
print LOG "$QUERY_ARRAY{Where}:";
print LOG "$QUERY_ARRAY{String}:";
print LOG "$HTTP_USER_AGENT\n";
close (LOG);

What is QUERY STRING

$ENV{QUERY_STRING}
This is the simplest method of getting the query string. Once you have it you can manipulate it as much as you want. A very simpe script (eg standard search script) can use the query string as a key, while others may want to seperate it into fields.

What is UNESCAPE

$value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C",hex($1))/ge;
Passing the query string into an environment variable does put a lot of restrictions onto what you can store. Therefore the viewer will translate these into special escape sequences. You will need to change these back to normal strings.

You should also change all '+' to spaces.

What is CONTENT TYPE

Content-type: text/html
The content type of a document is the standard MIME heading. This describes what the document is. eg: A Post script document, JPEG Picture, TAR Archive, HTML Document etc.

HTTPD (WWW Server) automatically adds this line to documents by looking at the file extension (eg: jpeg, jpg, html, tar, etc). Some people get around this by nameing their CGI script *.HTML.

The LOG File

It is always important to keep logs. This application shows you how you can keep some valuable data. You could even format this in the same way as an HTTPD Log. This would allow you to run WWW statistics software on your log and find out number of hits, which documents and which country.
950730:scott.meriden.pas.com.au::jethro.meriden.pas.com.au::Mozilla/1.1N (X11; I; Linux 1.1.90 i486)
950731:miriworld.its.unimelb.EDU.AU::jethro.meriden.pas.com.au::Mozilla/1.1N (Macintosh; I; 68K)  via proxy gateway  CERN-HTTPD/3.0 libwww/2.17
950731:bill.meriden.pas.com.au::jethro.meriden.pas.com.au::Mozilla/1.1N (Windows; I; 16bit)
950731:miriworld.its.unimelb.EDU.AU::jethro.meriden.pas.com.au::Mozilla/1.1N (Macintosh; I; 68K)  via proxy gateway  CERN-HTTPD/3.0 libwww/2.17
950731:scott.meriden.pas.com.au::jethro.meriden.pas.com.au::tutorials/www/:Search:Mozilla/1.1N (X11; I; Linux 1.1.90 i486)

Contents | Previous | Next | Search

Modified: 1/8/95
Created: 28/7/95
By: scottp@pas.com.au