Log File Analysis: Hits, Files, Sites and Pages

Ever wondered what pages, sites, hits and visits mean? These are special terms in the field of web analytics or more percise log file analysis.

There are still many people that use log file analyzers like webalizer or analog. These log file analyzers are included in most web hosting packages.

Let’s take a look at the metrics that these tools provide. Especially if you want to know what your server is doing.

The log files represent the server’s perspective – please keep in mind that these numbers do not really fit to the more recent web analysis tools like econda Monitor or GA.

Never the less the server’s perspective is still valuable and wide spread.

Log files

Log files are  a list of tasks the server actually completed. in other words: requests the server got and responses the server sent.

The logfile analyzer now takes the logs, counts the occurrence of certain words or phrases, and does statistics on this data.

Hits

Each line in a log file represents a request and response by the webserver. There will be one line for each graphics file or .html, .js or .css file that the server delivered. Additionally there will be lines containing requests for status information on the server.

The metric hits is nothing but the number of lines in the log file for a given time or the number of request the server served.

One single page that the visitor of a website is actually viewing might consist of 1  to 100 or more files. The number of hits per page view depends on the page that the visitor is requesting. Thus in most cases you cannot easily derive the number of page views from the number of hits that you log file analyzer counts.

Pages

To get the number of pages that your visitors requested, the analyzer counts only the lines that contain certain words or sequences of characters. E.g. “.html”, “.php”, “jsp” …

The count of the requested pages derived by log file analysis is not equal to the page views that you might see in a more advanced web analytics software using the tagging method. Many requests for pages that the visitor is viewing might hit a proxy, the local cache and not your server. The log file has no entries for the requests that never hit your server.

The next thing to keep in mind is that the number of pages depends on the character sequences or phrases that you count. If your CMS, blog or wiki is uses  php and your analyzer only counts lines containing “.htm” or “.html” the number of ‘pages served’ might be wrong.

Files

The number of files served  is equal to the number of lines in the log file minus the requests for status information to the server.

Sites

To serve requests the server needs to know the IP address of the visitor. Each line in the logfile contains the IP address of the client it serves. The number of sites is the count of unique IP addresses for a given time.

Visits

For each line in the log file the time to the previous request from the same IP is calculated. If the difference is bigger than a given time  the analyzer counts a new visit. In most cases the value for the timeout is 30 minutes. In other words – if your visitor is inactive for more than 30 minutes – we count a new visit with the next request we server for that visitor.

Are you using log file analyzers? Please take the time to write a comment and tell me how you use your analyzer.

Links:
webalizer – The Webalizer is a fast, free web server log file analysis program
analog – The most popular logfile analyser in the world

Leave a Reply

Your email address will not be published. Required fields are marked *