webalizer - A web server log file analysis tool.


SYNOPSIS

       webalizer [ option ... ] [ log-file ]



DESCRIPTION

       The  Webalizer  is  a web server log file analysis program
       which produces usage statistics in HTML format for viewing
       with  a browser.  The results are presented in both colum-
       nar and graphical format,  which  facilitates  interpreta-
       tion.   Yearly, monthly, daily and hourly usage statistics
       are presented, along with the ability to display usage  by
       site,  URL,  referrer,  user  agent  (browser) and country
       (user agent and referrer are only available  if  your  web
       server procduces Combined log format files).

       The  Webalizer supports CLF (common log format) log files,
       as well as Combined log formats as  defined  by  NCSA  and
       others,  and variations of these which it attempts to han-
       dle intelligently.

       This documentation applies to The Webalizer Version 1.30


RUNNING THE WEBALIZER

       The Webalizer was designed to be run from a  Unix  command
       line  prompt or as a crond(8) job. Once executed, the gen-
       eral flow of the program is:

       o       A default configuration file is  scanned  for.   A
               file  named  webalizer.conf is searched for in the
               current directory, and if found,  it's  configura-
               tion  data  is parsed.  If the file is not present
               in    the    current    directory,     the    file
               /etc/webalizer.conf is searched for and, if found,
               is used instead.

       o       Any command line arguments given  to  the  program
               are parsed.  This may include the specification of
               a configuration file, which is  processed  at  the
               time it is encountered.

       o       If a log file was specified, it is opened and made
               ready for processing.  If no log file  was  given,
               STDIN is used for input.

       o       If  an output directory was specified, the program
               does a chdir(2) to that  directory  in  prepration
               for generating output.  If no output directory was
               given, the current directory is used.

       o       If no hostname was given, the program attempts  to

       o       A history file is  searched  for  in  the  current
               directory  (output  directory)  and read if found.
               This file keeps totals for previous months,  which
               is  used  in  the  main  index.html HTML document.
               Note: The file location can now be specified  with
               the HistoryName configuration option.

       o       If  incremental  processing  was specified, a data
               file is searched for and loaded if found, contain-
               ing  the  'internal  state' data of the program at
               the end of a previous run.  Note: The  file  loca-
               tion can now be specified with the IncrementalName
               configuration option.

       o       Main processing begins on the log  file.   If  the
               log  spans  multiple months, a seperate HTML docu-
               ment is created for each month.

       o       After main processing, the main index.html page is
               created,  which  has  totals by month and links to
               each months HTML document.

       o       A  new  history  file  is  saved  to  disk,  which
               includes  totals generated by The Webalizer during
               the current run.

       o       If incremental processing was  specified,  a  data
               file is written that contains the 'internal state'
               data at the end of this run.


INCREMENTAL PROCESSING

       Version 1.2x of The Webalizer adds incremental  run  capa-
       bility.   Simply  put,  this  allows  processing large log
       files by breaking them up into smaller  pieces,  and  pro-
       cessing  these  pieces  instead.   What this means in real
       terms is that you can now rotate your log files  as  often
       as  you  want,  and still be able to produce monthly usage
       statistics without the loss of any detail.  Basically, The
       Webalizer  saves  and restores all internal data in a file
       named  webalizer.current.   This  allows  the  program  to
       'start  where  it  left  off'  so to speak, and allows the
       preservation of detail from one run to the next.  The data
       file  is  placed in the current output directory, and is a
       plain ascii text file that can be viewed with any standard
       text  editor.  It's location and name may be changed using
       the IncrementalName configuration keyword.

       Some special precautions need to be taken when  using  the
       incremental  run  capability of The Webalizer.  Configura-
       tion options should not be changed between runs,  as  that
       could  cause  corruption of the internal data stored.  For
       producing invalid results in the user  agents  section  of
       the  report.  If you need to change configuration options,
       do it at the end of the month after normal  processing  of
       the  previous  month  and  before  processing  the current
       month.  You may also want to delete the  webalizer.current
       file as well.

       The Webalizer also attempts to prevent data duplication by
       keeping track of the timestamp of  the  last  record  pro-
       cessed.   This  timestamp  is  then  compared  to  current
       records being processed, and any records that were  logged
       previous  to that timestamp are ignored.  This, in theory,
       should allow you to re-process logs that have already been
       processed,  or  process  logs  that  contain a mix of pro-
       cessed/not yet processed records, and not produce duplica-
       tion  of  statistics.   The only time this may break is if
       you have duplicate timestamps in two seperate log files...
       any  records  in the second log file that do have the same
       timestamp as the last record in the previous log file pro-
       cessed, will be discarded as if they had already been pro-
       cessed.  There are lots of ways to prevent  this  however,
       for  example, stopping the web server before rotating logs
       will prevent this situation.  This setup also necessitates
       that  you always process logs in chronological order, oth-
       erwise data loss will occur as a result of  the  timestamp
       compare.


COMMAND LINE OPTIONS

       The   Webalizer   supports  many  different  configuration
       options that will alter the way the  program  behaves  and
       generates  output.   Most of these can be specified on the
       command line, while some can only be specified in  a  con-
       figuration  file.  The  command  line  options  are listed
       below, with references to the corresponding  configuration
       file keywords.

       General Options

       -h      Display  all  available  command  line options and
               exit program.

       -v -V   Display program version and exit program.

       -d      Debug.  Display debugging information  for  errors
               and warnings.

       -g      GMTTime.   Use  GMT  instead of local timezone for
               reports.

       -i      IgnoreHist.  Ignore history.   USE  WITH  CAUTION.
               This will cause The Webalizer to ignore any previ-
               ous monthly history file only.   Incremental  data

       -q      Quiet.  Supress informational messages.  Does  not
               supress warnings or errors.

       -Q      ReallyQuiet.  Supress all messages including warn-
               ings and errors.

       -T      TimeMe.  Force display of  timing  information  at
               end of processing.

       -c file Use configuration file file.

       -n name Hostname.  Use the hostname name.

       -o dir  OutputDir.  Use output directory dir.

       -t name ReportTitle.  Use name for report title.

       -F      LogType.   Specify that the log being processed is
               an ftp log, instead of a web server log.  Log must
               be in standard xferlog format.

       -f      FoldSeqErr.  Fold out of sequence log records back
               into analysis, by treating as  if  they  were  the
               same date/time as the last good record.  Normally,
               out of sequence log records are simply ignored.

       -Y      CountryGraph. Supress country graph.

       -G      HourlyGraph.  Supress hourly graph.

       -x name HTMLExtension.  Defines  HTML  file  extension  to
               use.   If not specified, defaults to html.  Do not
               include the leading period.

       -H      HourlyStats.  Supress hourly statistics.

       -L      GraphLegend.  Supress color coded graph legends.

       -l num  GraphLines.  Specify number of  background  lines.
               Default  is  2.   Use  zero  ('0')  to disable the
               lines.

       -P name PageType.  Specify file extensions that  are  con-
               sidered   pages.    Sometimes   referred   to   as
               pageviews.

       -m num  VisitTimeout.  Specify the Visit  timeout  period.
               Must  be  given  in  HHMMSS format.  Default is 30
               minutes (3000).

       -I name IndexAlias.  Use the filename  name  as  an  addi-
               to the mangle level specified by num.  Mangle lev-
               els are:

               5   Browser name and major version.

               4   Browser name, major and minor version.

               3   Browser  name, major version, minor version to
                   two decimal places.

               2   Browser name, major  and  minor  versions  and
                   sub-version.

               1   Browser name, version and machine type if pos-
                   sible.

               0   All informaiton (left unchanged).

       Hide Options

       -a name HideAgent.  Hide user agents matching name.

       -r name HideReferrer.  Hide referrer matching name.

       -s name HideSite.  Hide site matching name.

       -u name HideURL.  Hide URL matching name.

       Table size options

       -A num  TopAgents.  Display the top num user agents table.

       -R num  TopReferrers.    Display  the  top  num  referrers
               table.

       -S num  TopSites.  Display the top num sites table.

       -U num  TopURLs.  Display the top num URL's table.

       -C num  TopCountries.   Display  the  top  num   countries
               table.

       -e num  TopEntry.   Display the top num entry pages table.

       -E num  TopExit.  Display the top num exit pages table.


CONFIGURATION FILES

       Configuration files are standard ascii(7) text files  that
       may be created or edited using any standard editor.  Blank
       lines and lines that begin with a  pound  sign  ('#')  are
       ignored.  Any other lines are considered to be configurga-
       tion lines, and have the form "Keyword Value",  where  the
       to  that particular option.  Any text found after the key-
       word up to the end of the line is considered the keyword's
       value, so you should not include anything after the actual
       value on the line that is not actually part of  the  value
       being  assigned.   The  file sample.conf provided with the
       distribution contains lots  of  useful  documentation  and
       examples as well.

       General Configuration Keywords

       LogFile name
               Use log file named name.  If none specified, STDIN
               will be used.

       LogType name
               Specify log file  type  as  name.  Values  can  be
               either web or ftp, with the default being web.

       OutputDir dir
               Create output in the directory dir.  If none spec-
               ified, the current directory will be used.

       HistoryName name
               Filename to use for  history  file.   Relative  to
               output  directory  unless  absolute  name is given
               (ie:  starts  with  '/').  Defaults   to   'webal-
               izer.hist' in the standard output directory.

       ReportTitle name
               Use  the  title  string name for the report title.
               If none specified, use the default of (in english)
               "Usage Statistics for ".

       Hostname name
               Set  the hostname for the report as name.  If none
               specified, an attempt will be made to  gather  the
               hostname  via  a  uname(2)  system  call.  If that
               fails, localhost will be used.

       UseHTTPS [  yes | no ]
               Use https:// on links  to  URLS,  instead  of  the
               default http://, in the 'Top URL's' table.

       Quiet [ yes | no ]
               Supress informational messages.  Warning and Error
               messages will not be supressed.

       ReallyQuiet [ yes | no ]
               Supress all messages, including Warning and  Error
               messages.

       Debug [ yes | no ]

       TimeMe [ yes | no ]
               Force timing information at end of processing.

       GMTTime [ yes | no ]
               Use GMT (UTC) time instead of local  timezone  for
               reports.

       IgnoreHist [ yes | no ]
               Ignore  previous  monthly  history file.  USE WITH
               CAUTION.  Does not prevent Incremental  file  pro-
               cessing.

       FoldSeqErr [ yes | no ]
               Fold  out of sequence log records back into analy-
               sis by treating them  as  if  they  had  the  same
               date/time  as the last good record.  Normally, out
               of sequence log records are ignored.

       CountryGraph [ yes | no ]
               Display Country Usage Graph in output report.

       HourlyGraph [ yes | no ]
               Display Hourly Graph in output report.

       HourlyStats [ yes | no ]
               Display Hourly Statistics in output report.

       PageType name
               Define the file extensions to consider as a  page.
               If  a  file is found to have the same extension as
               name, it will be  counted  as  a  page  (sometimes
               called a pageview).

       GraphLegend [ yes | no ]
               Allows   the  color  coded  graph  legends  to  be
               enabled/disabled.

       GraphLines num
               Specify the number of background  reference  lines
               displayed  on  the  graphs  produced.   Disable by
               using zero ('0'), default is 2.

       VisitTimeout num
               Specifies the visit timeout value.  Default is  30
               minutes.   A visit is determined by looking at the
               difference in time between the  current  and  last
               request  from  a specific site.  If the difference
               is greater or equal  to  the  timeout  value,  the
               request is counted as a new visit.

       IndexAlias name
               Mangle user agent names based on mangle level num.
               See the -M command line switch for  mangle  levels
               and  their  meaning.   The  default  is  0,  which
               doesn't mangle user agents at all.

       Incremental [ yes | no ]
               Enable Incremental mode processing.

       IncrementalName name
               Filename to use for incremental data.  Relative to
               output  directory unless an absolute name is given
               (ie:  starts  with  '/').   Defaults  to   'webal-
               izer.current' in the standard output directory.

       Top Table Keywords

       TopAgents num
               Display the top num User Agents table. Use zero to
               disable.

       TopReferrers num
               Display the top num Referrers table. Use  zero  to
               disable.

       TopSites num
               Display  the top num Sites table. Use zero to dis-
               able.

       TopKSites num
               Display the top num Sites (by KByte)  table.   Use
               zero to disable.

       TopURLs num
               Display  the  top num URLs table. Use zero to dis-
               able.

       TopKURLs num
               Display the top num URLs (by  KByte)  table.   Use
               zero to disable.

       TopCountries num
               Display  the  top  num Countries in the table. Use
               zero to disable.

       TopEntry num
               Display the top num Entry Pages in the table.  Use
               zero to disable.

       TopExit num
               Display  the top num Exit Pages in the table.  Use
               zero to disable.

               Display the top num Search Strings in  the  table.
               Use zero to disable.

       Hide/Ignore/Group/Include Keywords

       HideAgent name
               Hide User Agents that match name.

       HideReferrer name
               Hide Referrers that match name.

       HideSite name
               Hide Sites that match name.

       HideURL name
               Hide URL's that match name.

       IgnoreAgent name
               Ignore User Agents that match name.

       IgnoreReferrer name
               Ignore Referrers that match name.

       IgnoreSite name
               Ignore Sites that match name.

       IgnoreURL name
               Ignore URL's that match name.

       GroupAgent name [Label]
               Group  User Agents that match name.  Display Label
               in 'Top Agent' table if given (instead of name).

       GroupReferrer name [Label]
               Group Referrers that match name.  Display Label in
               'Top Referrer' table if given (instead of name).

       GroupSite name [Label]
               Group  Sites  that  match  name.  Display Label in
               'Top Site' table if given (instead of name).

       GroupURL name [Label]
               Group URL's that match  name.   Display  Label  in
               'Top URL' table if given (instead of name).

       IncludeSite name
               Force  inclusion  of sites that match name.  Takes
               precedence over Ignore# keywords.

       IncludeURL name
               Force inclusion of URL's that match  name.   Takes
               precedence over Ignore# keywords.
               Force  inclusion  of  Referrers  that  match name.
               Takes precedence over Ignore# keywords.

       IncludeAgent name
               Force inclusion of User Agents  that  match  name.
               Takes precedence over Ignore* keywords.

       HTML Generation Keywords

       HTMLExtension text
               Defines  the  HTML file extension to use.  Default
               is html.  Do not include the leading period!

       HTMLPre text
               Insert text at the very beginning of the generated
               HTML  file.   Defaults to a standard html 3.2 DOC-
               TYPE record.

       HTMLHead text
               Insert text within the <HEAD></HEAD> block of  the
               HTML file.

       HTMLBody text
               Insert text in HTML page, starting with the <BODY>
               tag.  If used, the first line must be a <BODY ...>
               tag.  Multiple lines may be specified.

       HTMLPost text
               Insert  text  at  top (before horiz. rule) of HTML
               pages.  Multiple lines may be specified.

       HTMLTail text
               Insert text at bottom of the HTML page.  The  text
               is  top and right aligned within a table column at
               the end of the report.

       HTMLEnd text
               Insert text at the very end of the HTML page.   If
               not specified, the default is to insert the ending
               </BODY> and </HTML> tags.  If used, you must  sup-
               ply these tags yourself.


FILES

       webalizer.conf      Default    configuration   file.    Is
                           searched for in the current  directory
                           and  if not found, in the /etc/ direc-
                           tory.

       webalizer.hist      Monthly history file for  previous  12
                           months.  (can be changed)

       webalizer.current   Current  state  data file (Incremental
       Report bugs to brad@mrunix.net.


COPYRIGHT

       Copyright (C) 1997-1999  by  Bradford  L.  Barrett.   Dis-
       tributed  under  the GNU GPL.  See the files "COPYING" and
       "Copyright", supplied with  all  distributions  for  addi-
       tional information.


AUTHOR

       Bradford L. Barrett <brad@mrunix.net>