[Prev: 1. What is caching?]
[Next: 3. Samples]
- 2.1. Where do I get it?
- 2.2. How do I run it?
- 2.2.1. Install it.
- 2.2.2. Configure it
- 2.3. What neat stuff can it do?
- 2.4. It's running, now what?
- 2.5. I don't understand.
Source code: ftp://www.unimelb.edu.au/pub/www/servers/unix/squid/
tar zxf <filename>
rpm -i squid.rpm
- No speakee Debian.
Port to listen for http type queries on, defaults to 3128
Port to listen for icp queries on, defaults to 3130
E-mail address of cache manager, appears on error pages...etc
Important if you start squid at boot time so it doesn't end up running as root.
Password in plain text for cachemgr operations, can specify different ones for different actions.
Used to setup a cache hierarchy. Basic setting for a client cache is:
cache_host <parents hostname> parent tcp-port icp-port
For a neighbour:
cache_host <neighbours hostname> neighbour tcp-port icp-port
Lets you specify which domains to ask a neighbour cache for
If specified anything that doesn't match will be considered outside the local net, this has 2 effects. No DNS lookup will be done on the URL host and the object will always be fetched from one of the parent/neighbour caches.
As above but with IP numbers. Cancels out any local_ip setting
URLs within the listed domains are always fetched directly.
Use if you must to do the same as local_domain but with the IP number of the URL host, costs a DNS lookup.
If these are found in a URL, don't ask neighbours for them. Defaults to cgi-bin and ?.
If these are found in a URL, don't write the suckers to disk. Defaults to cgi-bin and ?
Just like cache_stoplist but using a regex instead.
Amount of memory to use for storage of ``in-transit'', negatively cached and ``hot'' objects. Bear in mind that squid will use about 100 bytes of memory to keep track of each object on disk, with an average object size of 13k this works out to (for various cache_swap settings):
Total amount of disk to use, assumes that they are split evenly over cache_dirs.
Specify a directory for on-disk storage of objects. Use multiple entries to span multiple disks/partitions.
Place to stick the access log file, think hard it gets big quick. A cache that does about 1.4M requests generates roughly 180M of log files a day. They compress really well though :-)
Logs the actions of the storage manager, ditch it by setting to none.
Maximum size of a request. If people are using POST or PUT to upload files places, bump this to the maximum conceivable size.
These let you set how long to hang onto things. 4 args are supplied. The URL regex to match against, the minimum_age, a percentage and a maximum age. The idea is squid decides if an object is fresh or stale, if it's fresh it's given to the client, if it's stale an IMS is done first. The algorithm used is:
Calculate the AGE of the object.
AGE = NOW - OBJECT_DATE
Calculate last modified age of the object.
LM_AGE = OBJECT_DATE - LAST_MODIFIED_TIME
Calculate the ``last modified factor''.
LM_FACTOR = AGE / LM_AGE
Client can specify a maximum age in the HTTP/1.1 header Cache-Control. Call this optional parameter CLIENT_MAX_AGE.
The remote server can also expire objects with the Expires header. Call this parameter EXPIRES.
Check each refresh_pattern in order specified
If the client has specified a CLIENT_MAX_AGE and the AGE is greater than this, the object is stale.
If the AGE is less than the MIN_AGE, the object is fresh.
This is how you can specify minimum object lifetimes in your cache, for example Unimelb specifies the following:
refresh_pattern ^http://home.netscape.com/.* 720 100% 4320 refresh_pattern ^http://www.netscape.com/.* 720 100% 4320 refresh_pattern ^http://www.yahoo.com/.* 720 100% 4320 refresh_pattern/i jpg$ 4320 100% 10080 refresh_pattern/i gif$ 4320 100% 10080 refresh_pattern /$ 360 10% 4320 refresh_pattern . 0 20% 4320
If there is a server supplied EXPIRES and it is before NOW, the object is stale, otherwise it is fresh.
If the AGE of the object is greater than MAX_AGE it is stale
If LM_FACTOR is less than PERCENT the object is fresh
Object is stale.
Controls whether the cache will continue fetching things after a client has hit stop. 3 parameters, minkb, percent, maxkb. If the transfer has less than minkb to go, squid will continue fetching it, if the transfer has more than maxkb to go, squid will abort the fetch. Otherwise if more than percent has been fetched squid will finish the fetch. Disable fetching by setting minklb to -1.
Much fun here, see Section 2.3 for a discussion of the types of acls that can be specified. Use in conjunction with http_access, icp_access, miss_access. and cache_host_acl.
If matched allows or denys access to the cache via either http or icp.
Who can ask you to fetch something on their behalf. Forces peers to use you as a neighbour and not a parent.
Configure the number of directorys used in on disk-storage, try and work to:
swap_level1_dirs * swap_level2_dirs = amount of cache_swap / number of cache_dirs / average_object_size / 256.
This will limit you to approximately 256 items per directory. For example for a million objects in the cache (approx 12GB of disk) use 16 level 1 directorys and 256 level 2 directorys. (16 * 256 * 256 = 1,048,576 objects).
Keeps in-transit objects in memory.
Squid lets you set up arbitrarily complex caching hierarchies with the cache_host, cache_host_acl configuration items. Unimelb works in the following hierarchy:
3 On-campus proxy caches, 1 on which is a Netscape proxy.
2 sets of neighbours one on the VRN, includes Deakin, Latrobe, Monash, Swinburne and VUT. All of these are squid caches. Is also co-operating in a AARNET caching test with UTS and Macquarie in Sydney, UQ and Curtin in WA. We are setup so we will only ask VRN neighbours for anything identifiably inside Australia, we ask all our neighbours for other traffic.
Many and varied acls available to the administrator.
Based on the IP address of the requesting client, can specify in a number of different ways
Based on the URL hosts IP address specifiable in the same ways as src.
Based on the hostname of the requesting client.
Based on the URL host, usually used to deny access. For example
acl naughty dstdomain playboy.com penthousemag.com http_access deny naughty
Time based specification, usually combined with other acls. Specify time periods as [SMTWTFA] h1:m1-h2:m2. (Note: A=Saturday and h1:m1 must be less than h2:m2. For example, no perving during work hours.
acl workhours time MTWTF 08:00-18:00 http_access deny naughty workhours
regex match against whole URL. For example, no looking at competitors website:
acl competitor url_regex ^http://www.monash.edu.au/ http_access deny competitor
regex matching against the URL path. For example, a really bandwidth efficient cache:
acl nogifs urlpath_regex \.gif$ acl nojpgs urlpath_refex \.jpg$ http_access deny nogifs http_access deny nojpgs
Based on the port of the URL request, used mainly to stop nastys. For example:
acl cantgetme port 7 9 19 http_access deny cantgetme
Based on the protocol of the URL request. For example, to force people to live in the ninetys:
acl nogopher proto gopher http_access deny gopher
Based on the type of the request, GET POST PUT
Based on the user-agent string of the request. Uses a regex. For example wouldn't we all love to:
acl noie browser MSIE http_access deny noie
Based on an ident lookup of the requesting client. Automatically turns ident_lookup on.
These acls can be applied discriminatorily to http type accesses, neighbour or ICP requests and to which peer to make requests on.
Squid lets you filter all URLs supplied to it through an external program for rewriting. This lets you do things like, forcing people to use local ftp mirrors (see Section A.1) , getting rid of pesky banner ads (see Section A.2) amongst other applications. The redirector is passed the URL requested, the clients ip-address/Fully Qualified Domain Name, the ident lookup of the client and the method used. The FQDN and ident fields will be passed in as - if unavailable. The redirector program must respond with an empty line if nothing is to be done to the URL, or the modified URL to use.
Proxy authentication is part of HTTP/1.1 and it works pretty much like Basic Authentication for HTTP/1.0. The main squid source allows proxy_auth to be turned on for all access by editing the Makefile and specifing -DPROXY_AUTH.
A patch is available to squid to allow you to use another type of acl proxy_auth. This patch works much like the redirector with an external program which is passed the username password supplied and must respond with OK or ERR. Access is allowed or denied respectively. At The University of Melbourne we have modified this to be OK, ERR or QUOTA to enable us to enforce web use quotas on students.
Once the thing is running you will probably want to get some statistics on what it is doing. You can get various pieces of info out of the cachemgr.cgi that is supplied with squid. The method is post-processing of the log files. The first method can be automated somewhat with the MRTG package (see http://www.ee.ethz.ch/stats/mrtg) and some hacking. (E-mail me and I'll send you a tar file to wrestle with). Some other stuff to check out is:
Contact your upstream ISP and ask them about parenting. You may also consider using a feature built into squid to advertise you presence, check out squid.conf for the Cache Registration Service section.
[Prev: 1. What is caching?]
[Next: 3. Samples]
Generated: 13 November, 1998 Copyright: © 1998 The University of Melbourne Maintainer: firstname.lastname@example.org