The BRDSTATS Program - Dec. 5, 2002

Dec. 5, 2002: Updated to BRDSTATS 1.50a

Simon Begin has written a free program called BRDSTATS.EXE to do some analysis of the BorderManager common log files. You can download version 1.50a, (149k) HERE. <--You may need to hold the shift key down when you click on this link.

BRDSTATS read me (v1.40) (2001/05/28)
Description:

This program scans proxy server common log files, and creates HTML files containing these statistics:

(*) These are the default settings.

Requirements:

Note: This utility is free to use, please keep the author posted if you like it. In case of problems please read this entire file.

Installation and usage:

Copy BRDSTATS.EXE directly in your log directory (For BorderManager by default it is SYS:\ETC\PROXY\LOG\HTTP\COMMON). Take note that the program always works with the current directory. You can copy the executable anywhere you want, just remember to CD to the directory where your log files reside before running it.

From any DOS/Win9x/WinNT PC, open a DOS box and go into that directory. Run BRDSTATS or BRDSTATS [filename]. The logs must be closed, any open log is inaccessible. [Note from Craig: I suggest setting log files to roll over every 4 hours.]

The program will then read and summarize the ENTIRE log file. PLEASE BE PATIENT! The program can analyze more than 1000 lines/sec, depending on your PC and the number of statistics. It takes in my case 10 minutes to analyze 1 week of proxy activity (about 20 MB log file). You can abort the program at any time with the ALT-C key.

The output file written has the same name than the log file, but with the extension .HTM. If no filename is specified, it will scan ALLLOG FILES (*.LOG) in the current directory that doesn't already have an equivalent .HTM file. If you want to redo an HTM file, just delete it, and rerun BRDSTATS.

BRDSTATS will also create a INDEX.HTM file containing links to all other .HTM files available in the directory. This INDEX.HTM is recreated from scratch every time BRDSTATS is run and at least 1 log file is analyzed.

Configuration is made through BRDSTATS.INI, which is automatically created on the first time the program is run, with all the defaults. After the INI has been created, just use Notepad to modify it to suit your needs. All Top xx numbers can be set from 0 to 1000. If you want to remove a stat, put "0" to deactivate it. If any parameter is missing or misspelled in the INI file, defaults are used. You can delete the INI file and it will be recreated with defaults.

IMPORTANT: If you are upgrading from v1.30 you should delete the .INI file or at least rename it so it would be created with the latest settings. Parameters and also documentation within the .INI file changed in this version, and this is the only way to have that information. Also look at the "History" section at the end of this document for new features.

Additional info:

You can create a custom log file to obtain a specific analysis. I use the grep command to create a specific log file when specific needs arise. For xample, if I need an analysis of the web site "yahoo", I do:

grep -i yahoo (logfile) > yahoo.log

Then I rerun BRDSTATS and the yahoo.log is analyzed. (Grep is a unix command also available for DOS/Windows).

If you wish to automate BRDSTATS, I suggest you use a simple batch file that will CD to the Logs directory, run BRDSTATS, then copy all .HTM files to the desired web server directory.

The URL summary is based on the root url. For example http://www.123.com/main.htm and http://www.123.com/images/header.gif are counted as "http://www.123.com". The User stats uses the login name for the top 20. If there is no authentication to your proxy, you will have only 1 user, named "Unknown".

The speed BRDSTATS runs depends on the PC and also the number of stats to produce. Normally you should get a speed of 500 to 1000 lines/sec on a recent PC. If you need more speed, disable unused stats from the INI config file, starting with the more hungry ones: Top users/URL analysis and file type analysis. Note that you need to disable a feature (set to No or to 0) to gain more speed. Whether there's 1 or 40 items selected on any statistic, the time spent analyzing is the same.

Troubleshooting:

BRDSTATS will reject a log file if after 100 lines in error, or if there is more than 50% errors. If you need to see the lines that are rejected, set the "Debug" option to "Yes" in BRDSTATS.INI.

You may have a URL named "/ (Local file system)" or "http://(your local web server here)". This usually means that your users pass through your proxy server to get to the local web server, which may be a web browser configuration problem.

Another problem seen: In some cases, the log file reports a file size of 2GB. This really affects the statistics! Usually it is a video stream. Of course the user didn't download that much data, but the transfer has started. At this time, I don't have any answer for this, besides using a file editor and manually deleting those lines from the log file.

The proxy log does not tell if the data has been served from cache or from the internet. A file accessed 10 times may be downloaded from the internet only once, then read from the cache the 9 other times. The proxy stats will show the 10 times, thus you cannot use the proxy stats to evaluate your internet traffic. The proxy return code 304 Not Modified seems to give some hint on cache "hits". But these cache Hits do not account for all cache hits of the proxy server. This return code turns around 20 to 30% of all hits, while BorderManager stats always shows 70% cache hits. The code 304 is a response to a conditional request. e.g. The proxy estimates there should be a newer version of that file available, then it issues a conditional GET request, with the file name, date, time and size. The web server returns the file requested if it has changed, or code 304 if it hasn't. Anyone having more info on how to calculate caching from the proxy log files, please contact me.

BRDSTATS has been tested with BorderManager 3.5 through 3.8. Some users have tested it on BM version 3.0. Any other proxy server welcome, as I use the common log format.

BRDSTATS uses DBF file format to sum stats. These files are left there after the program is run and that can be imported in any database or spreadsheet to get more detailed analysis. There are 3 files that are always overwritten for each log analyzed, so if BRDSTATS analyzes 2 or more logs, only the last analysis is left. In short, there's BRDURL, which contains a record for each specific URL, BRDUSR with a record for each unique user, and finally BRDBOTH which has a record for each unique USER And URL.

Send any Comments / Suggestions / Ask for source code (Clipper 5.3) to:

Simon Begin

History

Version 1.50 (20011127):

Version 1.40 (20010528):

Version 1.30 (20001213):

Version 1.23 (20001019):

Future enhancements:

Filter options. The desired result is to include and/or exclude some string from the log files. It could be a url, a user, or everything which is in the log.

Get the BRDSTATS.ZIP v1.50 file (149k) by clicking HERE. <--You may need to hold the shift key down when you click on this link.



Return to the Main Page