HowTo: Merge Apache / Lighttpd / Nginx Server Log Files

My ecommerce site runs using a cluster of Apache web server. The cluster is behind nginx load balancer. I want to merge backend Apache web server log files for statistics purpose. How do I mere web server log files under Linux / UNIX like operating systems using cron jobs?

You need to use logresolvemerge Perl script which is part of AWStats package. The logresolvemerge allows you to get one unique output log file, sorted on date, built from particular sources:

  • Act as a log merger for any other log analyzer.
  • It can read several input log files.
  • It can read .gz/.bz2 log files.
  • It can also makes a fast reverse DNS lookup to replace; all IP addresses into host names in resulting log file.

Step #1: Install awstats

Turn on EPEL repo from the Fedora project and use the yum command install awstats package, enter:
# yum -y install awstats
Please note that the perl based log merger script is installed at /usr/share/awstats/tools/

Step #2: Grab Logs From All Other Servers / Node

Our sample setup is as follows:

	| apache1  +
	| apache2  +
	| apache3  +
|           |          |   LAN 
------+     +-------+  +-------------+ 
Nginx |     |mysql1 |  | cachingnode |
lb1   |     |cluster|  | / stats     |
------+     +-------+  +-------------+              


  • You need to fetch logs from apache1, apache2, and apache3 nodes.
  • You will use caching node to build stats.

First, create a directory to store logs. Type the following commands on cachingnode (
# D=/var/logs/clusterlogs
# mkdir -p $D
# mkdir -p $D/raw
# mkdir -p $D/raw/apache{1,2,3}
# mkdir -p $D/reports

Use the rsync command to fetch log files from each server:
# rsync -azv user@apache1:/var/logs/httpd/access_logs* $D/raw/apache1
# rsync -azv user@apache2:/var/logs/httpd/access_logs* $D/raw/apache2
# rsync -azv user@apache3:/var/logs/httpd/access_logs* $D/raw/apache3

Step #3: Merge Log Files

Use as follows to merge all log files, enter:
# /usr/share/awstats/tools/ $D/raw/apache1/access_logs* $D/raw/apache2/access_logs*
$D/raw/apache3/access_logs* > $D/raw/merged_access_logs

The above will create a new file called $D/raw/merged_access_logs. You need to use this file to create logs. You can delete the rest of all files:
# rm -f $D/raw/apache1/access_logs*
# rm -f $D/raw/apache2/access_logs*
# rm -f $D/raw/apache3/access_logs*

Step #4: How Do I Generate Graphs With Webalizer?

The Webalizer is a fast, free, web-server log files analysis program. See how to install Webalizer. You need to create webalizer configuration file, enter:
# mkdir -p $D/reports/webalizer
# cp /etc/webalizer.conf.sample $D/reports/webalizer/webalizer.conf

Edit config file, enter:
# vi $D/reports/webalizer/webalizer.conf
Update it as follows:

LogFile         /var/logs/clusterlogs/raw/merged_access_logs
OutputDir       /var/www/usage
HistoryName     /var/logs/clusterlogs/raw/webalizer.hist
Incremental     yes
IncrementalName /var/logs/clusterlogs/raw/webalizer.current

Save and close the file. To generate stats, enter:
# webalizer -c $D/reports/webalizer/webalizer.conf
Your reports will be generated and stored at /var/www/usage directory. You can access /var/www/usage using apache server running on cachingnode (http://cachingnode/usage/).

Step #5: How Do I Generate Graphs With AWStats?

AWStats is a free powerful tool that generates advanced web, streaming, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line. See how to install AWStats under Linux. Once installed edit awstats configuration file, enter:
# vi /etc/awstats/
Update config directives as follows:


Save and close the file. Type the following command at a shell prompt to generate stats, enter:
# /usr/share/awstats/wwwroot/cgi-bin/ -update

Cron Job: Fetch Log Files And Generate Stats

Create a shell script as follows using cron jobs:
# vi /etc/cron.daily/
Append the code:

# Author: SXI ADMIN <>, under GPL v2.0+
# --------------------------------------------------------------------
# user1@remote1:/path/to/log user2@remote2:/path/to/log
_nodes='user@apache1:/var/logs/httpd/access_logs*  user@apache2:/var/logs/httpd/access_logs*  user@apache3:/var/logs/httpd/access_logs*'
# Set full path with args
_rsync="/usr/bin/rsync -az "
# log files and dirs
# Build path and fetch log files
[ ! -d "$D/raw" ] && $_mkdir -p "$D/raw"
for f in $_nodes
	set -- $f
	[ ! -d "$n" ] && $_mkdir -p "$n"
        # grab the log file
	$_rsync "$f" "$n"
	_path="$_path $n/* "
# Merge it 
$_merge "$_path" >"$_mergedlog"
# Generate webalizer stats
[ -f $D/reports/webalizer/webalizer.conf ] && $_webalizer -c $D/reports/webalizer/webalizer.conf &>/dev/null
# Generate Awstats too
# [ -x $_awstats ] && $_awstats -update
# Add your other stats commands here
## /path/to/my_stats_app  -f "$D/reports/webalizer/webalizer.conf"
# Add your other stats commands here
# Clean up
$_rm -f "$_mergedlog"
$_rm -f "$_path"

Posted by: SXI ADMIN

The author is the creator of SXI LLC and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter.

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

How to Make Website WCAG Compliant?

Next Post

Link download Kali Linux 2020.1 (ISO + Torrent)

Related Posts