webalizer - A web server log file analysis tool.
SYNOPSIS
webalizer [ option ... ] [ log-file ]
DESCRIPTION
The Webalizer is a web server log file analysis program
which produces usage statistics in HTML format for viewing
with a browser. The results are presented in both colum-
nar and graphical format, which facilitates interpreta-
tion. Yearly, monthly, daily and hourly usage statistics
are presented, along with the ability to display usage by
site, URL, referrer, user agent (browser) and country
(user agent and referrer are only available if your web
server procduces Combined log format files).
The Webalizer supports CLF (common log format) log files,
as well as Combined log formats as defined by NCSA and
others, and variations of these which it attempts to han-
dle intelligently.
This documentation applies to The Webalizer Version 1.30
RUNNING THE WEBALIZER
The Webalizer was designed to be run from a Unix command
line prompt or as a crond(8) job. Once executed, the gen-
eral flow of the program is:
o A default configuration file is scanned for. A
file named webalizer.conf is searched for in the
current directory, and if found, it's configura-
tion data is parsed. If the file is not present
in the current directory, the file
/etc/webalizer.conf is searched for and, if found,
is used instead.
o Any command line arguments given to the program
are parsed. This may include the specification of
a configuration file, which is processed at the
time it is encountered.
o If a log file was specified, it is opened and made
ready for processing. If no log file was given,
STDIN is used for input.
o If an output directory was specified, the program
does a chdir(2) to that directory in prepration
for generating output. If no output directory was
given, the current directory is used.
o If no hostname was given, the program attempts to
o A history file is searched for in the current
directory (output directory) and read if found.
This file keeps totals for previous months, which
is used in the main index.html HTML document.
Note: The file location can now be specified with
the HistoryName configuration option.
o If incremental processing was specified, a data
file is searched for and loaded if found, contain-
ing the 'internal state' data of the program at
the end of a previous run. Note: The file loca-
tion can now be specified with the IncrementalName
configuration option.
o Main processing begins on the log file. If the
log spans multiple months, a seperate HTML docu-
ment is created for each month.
o After main processing, the main index.html page is
created, which has totals by month and links to
each months HTML document.
o A new history file is saved to disk, which
includes totals generated by The Webalizer during
the current run.
o If incremental processing was specified, a data
file is written that contains the 'internal state'
data at the end of this run.
INCREMENTAL PROCESSING
Version 1.2x of The Webalizer adds incremental run capa-
bility. Simply put, this allows processing large log
files by breaking them up into smaller pieces, and pro-
cessing these pieces instead. What this means in real
terms is that you can now rotate your log files as often
as you want, and still be able to produce monthly usage
statistics without the loss of any detail. Basically, The
Webalizer saves and restores all internal data in a file
named webalizer.current. This allows the program to
'start where it left off' so to speak, and allows the
preservation of detail from one run to the next. The data
file is placed in the current output directory, and is a
plain ascii text file that can be viewed with any standard
text editor. It's location and name may be changed using
the IncrementalName configuration keyword.
Some special precautions need to be taken when using the
incremental run capability of The Webalizer. Configura-
tion options should not be changed between runs, as that
could cause corruption of the internal data stored. For
producing invalid results in the user agents section of
the report. If you need to change configuration options,
do it at the end of the month after normal processing of
the previous month and before processing the current
month. You may also want to delete the webalizer.current
file as well.
The Webalizer also attempts to prevent data duplication by
keeping track of the timestamp of the last record pro-
cessed. This timestamp is then compared to current
records being processed, and any records that were logged
previous to that timestamp are ignored. This, in theory,
should allow you to re-process logs that have already been
processed, or process logs that contain a mix of pro-
cessed/not yet processed records, and not produce duplica-
tion of statistics. The only time this may break is if
you have duplicate timestamps in two seperate log files...
any records in the second log file that do have the same
timestamp as the last record in the previous log file pro-
cessed, will be discarded as if they had already been pro-
cessed. There are lots of ways to prevent this however,
for example, stopping the web server before rotating logs
will prevent this situation. This setup also necessitates
that you always process logs in chronological order, oth-
erwise data loss will occur as a result of the timestamp
compare.
COMMAND LINE OPTIONS
The Webalizer supports many different configuration
options that will alter the way the program behaves and
generates output. Most of these can be specified on the
command line, while some can only be specified in a con-
figuration file. The command line options are listed
below, with references to the corresponding configuration
file keywords.
General Options
-h Display all available command line options and
exit program.
-v -V Display program version and exit program.
-d Debug. Display debugging information for errors
and warnings.
-g GMTTime. Use GMT instead of local timezone for
reports.
-i IgnoreHist. Ignore history. USE WITH CAUTION.
This will cause The Webalizer to ignore any previ-
ous monthly history file only. Incremental data
-q Quiet. Supress informational messages. Does not
supress warnings or errors.
-Q ReallyQuiet. Supress all messages including warn-
ings and errors.
-T TimeMe. Force display of timing information at
end of processing.
-c file Use configuration file file.
-n name Hostname. Use the hostname name.
-o dir OutputDir. Use output directory dir.
-t name ReportTitle. Use name for report title.
-F LogType. Specify that the log being processed is
an ftp log, instead of a web server log. Log must
be in standard xferlog format.
-f FoldSeqErr. Fold out of sequence log records back
into analysis, by treating as if they were the
same date/time as the last good record. Normally,
out of sequence log records are simply ignored.
-Y CountryGraph. Supress country graph.
-G HourlyGraph. Supress hourly graph.
-x name HTMLExtension. Defines HTML file extension to
use. If not specified, defaults to html. Do not
include the leading period.
-H HourlyStats. Supress hourly statistics.
-L GraphLegend. Supress color coded graph legends.
-l num GraphLines. Specify number of background lines.
Default is 2. Use zero ('0') to disable the
lines.
-P name PageType. Specify file extensions that are con-
sidered pages. Sometimes referred to as
pageviews.
-m num VisitTimeout. Specify the Visit timeout period.
Must be given in HHMMSS format. Default is 30
minutes (3000).
-I name IndexAlias. Use the filename name as an addi-
to the mangle level specified by num. Mangle lev-
els are:
5 Browser name and major version.
4 Browser name, major and minor version.
3 Browser name, major version, minor version to
two decimal places.
2 Browser name, major and minor versions and
sub-version.
1 Browser name, version and machine type if pos-
sible.
0 All informaiton (left unchanged).
Hide Options
-a name HideAgent. Hide user agents matching name.
-r name HideReferrer. Hide referrer matching name.
-s name HideSite. Hide site matching name.
-u name HideURL. Hide URL matching name.
Table size options
-A num TopAgents. Display the top num user agents table.
-R num TopReferrers. Display the top num referrers
table.
-S num TopSites. Display the top num sites table.
-U num TopURLs. Display the top num URL's table.
-C num TopCountries. Display the top num countries
table.
-e num TopEntry. Display the top num entry pages table.
-E num TopExit. Display the top num exit pages table.
CONFIGURATION FILES
Configuration files are standard ascii(7) text files that
may be created or edited using any standard editor. Blank
lines and lines that begin with a pound sign ('#') are
ignored. Any other lines are considered to be configurga-
tion lines, and have the form "Keyword Value", where the
to that particular option. Any text found after the key-
word up to the end of the line is considered the keyword's
value, so you should not include anything after the actual
value on the line that is not actually part of the value
being assigned. The file sample.conf provided with the
distribution contains lots of useful documentation and
examples as well.
General Configuration Keywords
LogFile name
Use log file named name. If none specified, STDIN
will be used.
LogType name
Specify log file type as name. Values can be
either web or ftp, with the default being web.
OutputDir dir
Create output in the directory dir. If none spec-
ified, the current directory will be used.
HistoryName name
Filename to use for history file. Relative to
output directory unless absolute name is given
(ie: starts with '/'). Defaults to 'webal-
izer.hist' in the standard output directory.
ReportTitle name
Use the title string name for the report title.
If none specified, use the default of (in english)
"Usage Statistics for ".
Hostname name
Set the hostname for the report as name. If none
specified, an attempt will be made to gather the
hostname via a uname(2) system call. If that
fails, localhost will be used.
UseHTTPS [ yes | no ]
Use https:// on links to URLS, instead of the
default http://, in the 'Top URL's' table.
Quiet [ yes | no ]
Supress informational messages. Warning and Error
messages will not be supressed.
ReallyQuiet [ yes | no ]
Supress all messages, including Warning and Error
messages.
Debug [ yes | no ]
TimeMe [ yes | no ]
Force timing information at end of processing.
GMTTime [ yes | no ]
Use GMT (UTC) time instead of local timezone for
reports.
IgnoreHist [ yes | no ]
Ignore previous monthly history file. USE WITH
CAUTION. Does not prevent Incremental file pro-
cessing.
FoldSeqErr [ yes | no ]
Fold out of sequence log records back into analy-
sis by treating them as if they had the same
date/time as the last good record. Normally, out
of sequence log records are ignored.
CountryGraph [ yes | no ]
Display Country Usage Graph in output report.
HourlyGraph [ yes | no ]
Display Hourly Graph in output report.
HourlyStats [ yes | no ]
Display Hourly Statistics in output report.
PageType name
Define the file extensions to consider as a page.
If a file is found to have the same extension as
name, it will be counted as a page (sometimes
called a pageview).
GraphLegend [ yes | no ]
Allows the color coded graph legends to be
enabled/disabled.
GraphLines num
Specify the number of background reference lines
displayed on the graphs produced. Disable by
using zero ('0'), default is 2.
VisitTimeout num
Specifies the visit timeout value. Default is 30
minutes. A visit is determined by looking at the
difference in time between the current and last
request from a specific site. If the difference
is greater or equal to the timeout value, the
request is counted as a new visit.
IndexAlias name
Mangle user agent names based on mangle level num.
See the -M command line switch for mangle levels
and their meaning. The default is 0, which
doesn't mangle user agents at all.
Incremental [ yes | no ]
Enable Incremental mode processing.
IncrementalName name
Filename to use for incremental data. Relative to
output directory unless an absolute name is given
(ie: starts with '/'). Defaults to 'webal-
izer.current' in the standard output directory.
Top Table Keywords
TopAgents num
Display the top num User Agents table. Use zero to
disable.
TopReferrers num
Display the top num Referrers table. Use zero to
disable.
TopSites num
Display the top num Sites table. Use zero to dis-
able.
TopKSites num
Display the top num Sites (by KByte) table. Use
zero to disable.
TopURLs num
Display the top num URLs table. Use zero to dis-
able.
TopKURLs num
Display the top num URLs (by KByte) table. Use
zero to disable.
TopCountries num
Display the top num Countries in the table. Use
zero to disable.
TopEntry num
Display the top num Entry Pages in the table. Use
zero to disable.
TopExit num
Display the top num Exit Pages in the table. Use
zero to disable.
Display the top num Search Strings in the table.
Use zero to disable.
Hide/Ignore/Group/Include Keywords
HideAgent name
Hide User Agents that match name.
HideReferrer name
Hide Referrers that match name.
HideSite name
Hide Sites that match name.
HideURL name
Hide URL's that match name.
IgnoreAgent name
Ignore User Agents that match name.
IgnoreReferrer name
Ignore Referrers that match name.
IgnoreSite name
Ignore Sites that match name.
IgnoreURL name
Ignore URL's that match name.
GroupAgent name [Label]
Group User Agents that match name. Display Label
in 'Top Agent' table if given (instead of name).
GroupReferrer name [Label]
Group Referrers that match name. Display Label in
'Top Referrer' table if given (instead of name).
GroupSite name [Label]
Group Sites that match name. Display Label in
'Top Site' table if given (instead of name).
GroupURL name [Label]
Group URL's that match name. Display Label in
'Top URL' table if given (instead of name).
IncludeSite name
Force inclusion of sites that match name. Takes
precedence over Ignore# keywords.
IncludeURL name
Force inclusion of URL's that match name. Takes
precedence over Ignore# keywords.
Force inclusion of Referrers that match name.
Takes precedence over Ignore# keywords.
IncludeAgent name
Force inclusion of User Agents that match name.
Takes precedence over Ignore* keywords.
HTML Generation Keywords
HTMLExtension text
Defines the HTML file extension to use. Default
is html. Do not include the leading period!
HTMLPre text
Insert text at the very beginning of the generated
HTML file. Defaults to a standard html 3.2 DOC-
TYPE record.
HTMLHead text
Insert text within the <HEAD></HEAD> block of the
HTML file.
HTMLBody text
Insert text in HTML page, starting with the <BODY>
tag. If used, the first line must be a <BODY ...>
tag. Multiple lines may be specified.
HTMLPost text
Insert text at top (before horiz. rule) of HTML
pages. Multiple lines may be specified.
HTMLTail text
Insert text at bottom of the HTML page. The text
is top and right aligned within a table column at
the end of the report.
HTMLEnd text
Insert text at the very end of the HTML page. If
not specified, the default is to insert the ending
</BODY> and </HTML> tags. If used, you must sup-
ply these tags yourself.
FILES
webalizer.conf Default configuration file. Is
searched for in the current directory
and if not found, in the /etc/ direc-
tory.
webalizer.hist Monthly history file for previous 12
months. (can be changed)
webalizer.current Current state data file (Incremental
Report bugs to brad@mrunix.net.
COPYRIGHT
Copyright (C) 1997-1999 by Bradford L. Barrett. Dis-
tributed under the GNU GPL. See the files "COPYING" and
"Copyright", supplied with all distributions for addi-
tional information.
AUTHOR
Bradford L. Barrett <brad@mrunix.net>