[The University of Melbourne]

[Top] [Contents]
[Prev: 1. What is caching?]
[Next: 3. Samples]

Web Caching

2. Squid

2.1. Where do I get it?
2.2. How do I run it?
2.2.1. Install it.
2.2.2. Configure it
2.3. What neat stuff can it do?
2.4. It's running, now what?
2.5. I don't understand.

2.1. Where do I get it?

2.2. How do I run it?

2.2.1. Install it.


tar zxf <filename>

cd squid-1.1.21

./configure --prefix=<whereever>


make install


rpm -i squid.rpm


- No speakee Debian.

2.2.2. Configure it Find the configuration file






*shrug* Things to Sanity Check

http_port - integer

Port to listen for http type queries on, defaults to 3128

icp_port - integer

Port to listen for icp queries on, defaults to 3130

cache_mgr - string

E-mail address of cache manager, appears on error pages...etc


Important if you start squid at boot time so it doesn't end up running as root.


Password in plain text for cachemgr operations, can specify different ones for different actions. Things Requiring Some Thought


Used to setup a cache hierarchy. Basic setting for a client cache is:

cache_host <parents hostname> parent tcp-port icp-port

For a neighbour:

cache_host <neighbours hostname> neighbour tcp-port icp-port


Lets you specify which domains to ask a neighbour cache for


If specified anything that doesn't match will be considered outside the local net, this has 2 effects. No DNS lookup will be done on the URL host and the object will always be fetched from one of the parent/neighbour caches.


As above but with IP numbers. Cancels out any local_ip setting


URLs within the listed domains are always fetched directly.


Use if you must to do the same as local_domain but with the IP number of the URL host, costs a DNS lookup.


If these are found in a URL, don't ask neighbours for them. Defaults to cgi-bin and ?.


If these are found in a URL, don't write the suckers to disk. Defaults to cgi-bin and ?


Just like cache_stoplist but using a regex instead.

cache_mem (MB)

Amount of memory to use for storage of ``in-transit'', negatively cached and ``hot'' objects. Bear in mind that squid will use about 100 bytes of memory to keep track of each object on disk, with an average object size of 13k this works out to (for various cache_swap settings):

 Table 1: Memory Requirements


No Objects Held 

Memory Required 
















cache_swap (MB)

Total amount of disk to use, assumes that they are split evenly over cache_dirs.


Specify a directory for on-disk storage of objects. Use multiple entries to span multiple disks/partitions.


Place to stick the access log file, think hard it gets big quick. A cache that does about 1.4M requests generates roughly 180M of log files a day. They compress really well though :-)


Logs the actions of the storage manager, ditch it by setting to none.

request_size (KB)

Maximum size of a request. If people are using POST or PUT to upload files places, bump this to the maximum conceivable size.

refresh_pattern and refresh_pattern/i

These let you set how long to hang onto things. 4 args are supplied. The URL regex to match against, the minimum_age, a percentage and a maximum age. The idea is squid decides if an object is fresh or stale, if it's fresh it's given to the client, if it's stale an IMS is done first. The algorithm used is:

  1. Calculate the AGE of the object.


  2. Calculate last modified age of the object.


  3. Calculate the ``last modified factor''.


  4. Client can specify a maximum age in the HTTP/1.1 header Cache-Control. Call this optional parameter CLIENT_MAX_AGE.

  5. The remote server can also expire objects with the Expires header. Call this parameter EXPIRES.

  6. Check each refresh_pattern in order specified

  7. If the client has specified a CLIENT_MAX_AGE and the AGE is greater than this, the object is stale.

  8. If the AGE is less than the MIN_AGE, the object is fresh.

    This is how you can specify minimum object lifetimes in your cache, for example Unimelb specifies the following:

    refresh_pattern ^http://home.netscape.com/.*     720   100%  4320 
    refresh_pattern ^http://www.netscape.com/.*      720   100%  4320 
    refresh_pattern ^http://www.yahoo.com/.*         720   100%  4320 
    refresh_pattern/i jpg$                             4320  100%  10080 
    refresh_pattern/i gif$                             4320  100%  10080 
    refresh_pattern   /$                               360   10%   4320 
    refresh_pattern   .                                0     20%   4320 
  9. If there is a server supplied EXPIRES and it is before NOW, the object is stale, otherwise it is fresh.

  10. If the AGE of the object is greater than MAX_AGE it is stale

  11. If LM_FACTOR is less than PERCENT the object is fresh

  12. Object is stale.


Controls whether the cache will continue fetching things after a client has hit stop. 3 parameters, minkb, percent, maxkb. If the transfer has less than minkb to go, squid will continue fetching it, if the transfer has more than maxkb to go, squid will abort the fetch. Otherwise if more than percent has been fetched squid will finish the fetch. Disable fetching by setting minklb to -1.


Much fun here, see Section 2.3 for a discussion of the types of acls that can be specified. Use in conjunction with http_access, icp_access, miss_access. and cache_host_acl.

http_access, icp_access

If matched allows or denys access to the cache via either http or icp.


Who can ask you to fetch something on their behalf. Forces peers to use you as a neighbour and not a parent.

swap_level1_dirs, swap_level2_dirs

Configure the number of directorys used in on disk-storage, try and work to:

swap_level1_dirs * swap_level2_dirs = amount of cache_swap / number of cache_dirs / average_object_size / 256.

This will limit you to approximately 256 items per directory. For example for a million objects in the cache (approx 12GB of disk) use 16 level 1 directorys and 256 level 2 directorys. (16 * 256 * 256 = 1,048,576 objects).

2.3. What neat stuff can it do?

2 Flavours in 1.1 series
  • VM

    Keeps in-transit objects in memory.

  • NoVM


Hierarchical Caching

Squid lets you set up arbitrarily complex caching hierarchies with the cache_host, cache_host_acl configuration items. Unimelb works in the following hierarchy:


3 On-campus proxy caches, 1 on which is a Netscape proxy.


2 sets of neighbours one on the VRN, includes Deakin, Latrobe, Monash, Swinburne and VUT. All of these are squid caches. Is also co-operating in a AARNET caching test with UTS and Macquarie in Sydney, UQ and Curtin in WA. We are setup so we will only ask VRN neighbours for anything identifiably inside Australia, we ask all our neighbours for other traffic.

Highly configurable access control

Many and varied acls available to the administrator.


Based on the IP address of the requesting client, can specify in a number of different ways

  1. ip=address/netmask (

  2. ip-address/cidr (

  3. ip-address-ipaddress (


Based on the URL hosts IP address specifiable in the same ways as src.


Based on the hostname of the requesting client.


Based on the URL host, usually used to deny access. For example

acl naughty dstdomain playboy.com penthousemag.com
http_access deny naughty

Time based specification, usually combined with other acls. Specify time periods as [SMTWTFA] h1:m1-h2:m2. (Note: A=Saturday and h1:m1 must be less than h2:m2. For example, no perving during work hours.

acl workhours time MTWTF 08:00-18:00
http_access deny naughty workhours

regex match against whole URL. For example, no looking at competitors website:

acl competitor url_regex ^http://www.monash.edu.au/
http_access deny competitor

regex matching against the URL path. For example, a really bandwidth efficient cache:

acl nogifs urlpath_regex \.gif$
acl nojpgs urlpath_refex \.jpg$
http_access deny nogifs
http_access deny nojpgs

Based on the port of the URL request, used mainly to stop nastys. For example:

acl cantgetme port 7 9 19
http_access deny cantgetme

Based on the protocol of the URL request. For example, to force people to live in the ninetys:

acl nogopher proto gopher
http_access deny gopher

Based on the type of the request, GET POST PUT


Based on the user-agent string of the request. Uses a regex. For example wouldn't we all love to:

acl noie browser MSIE
http_access deny noie

Based on an ident lookup of the requesting client. Automatically turns ident_lookup on.

These acls can be applied discriminatorily to http type accesses, neighbour or ICP requests and to which peer to make requests on.

Transparent redirection

Squid lets you filter all URLs supplied to it through an external program for rewriting. This lets you do things like, forcing people to use local ftp mirrors (see Section A.1) , getting rid of pesky banner ads (see Section A.2) amongst other applications. The redirector is passed the URL requested, the clients ip-address/Fully Qualified Domain Name, the ident lookup of the client and the method used. The FQDN and ident fields will be passed in as - if unavailable. The redirector program must respond with an empty line if nothing is to be done to the URL, or the modified URL to use.

Authenticated Proxy Access

Proxy authentication is part of HTTP/1.1 and it works pretty much like Basic Authentication for HTTP/1.0. The main squid source allows proxy_auth to be turned on for all access by editing the Makefile and specifing -DPROXY_AUTH.

A patch is available to squid to allow you to use another type of acl proxy_auth. This patch works much like the redirector with an external program which is passed the username password supplied and must respond with OK or ERR. Access is allowed or denied respectively. At The University of Melbourne we have modified this to be OK, ERR or QUOTA to enable us to enforce web use quotas on students.

2.4. It's running, now what?


Once the thing is running you will probably want to get some statistics on what it is doing. You can get various pieces of info out of the cachemgr.cgi that is supplied with squid. The method is post-processing of the log files. The first method can be automated somewhat with the MRTG package (see http://www.ee.ethz.ch/stats/mrtg) and some hacking. (E-mail me and I'll send you a tar file to wrestle with). Some other stuff to check out is:

Joining Hierarchies

Contact your upstream ISP and ask them about parenting. You may also consider using a feature built into squid to advertise you presence, check out squid.conf for the Cache Registration Service section.

2.5. I don't understand.

[Top] [Contents]
[Prev: 1. What is caching?]
[Next: 3. Samples]

Generated:  13 November, 1998
Copyright:  © 1998 The University of Melbourne
Maintainer: cwis@www.unimelb.edu.au