LogBeast CrawlBeast Consulting Blog Download Free

Apache vs Nginx Log Formats: The Complete Analysis Guide

Complete comparison of Apache and Nginx log formats. Learn Combined Log Format, custom directives, JSON logging, and parsing techniques for effective log analysis.

📋
✨ Summarize with AI

1. Why Log Format Matters

Every HTTP request that hits your web server generates a log entry. The format of that entry determines what you can analyze, how quickly you can parse it, and whether your monitoring pipeline can ingest it efficiently. Choosing the right log format is not a trivial decision -- it directly impacts your ability to debug production issues, detect security threats, optimize performance, and understand traffic patterns.

Apache HTTP Server and Nginx account for over 70% of all web servers on the internet. Despite serving the same fundamental purpose, they use different syntax for log format configuration, different default field orders, and different variable naming conventions. Understanding both is essential for any operations engineer, SRE, or developer working with web infrastructure.

Key Insight: The default log formats for both Apache and Nginx derive from the NCSA Common Log Format defined in the early 1990s. Despite being over 30 years old, CLF remains the foundation that most log analysis tools expect. Understanding this lineage helps you make informed decisions about custom formats.

In this guide, we will dissect every field in both Apache and Nginx log formats, compare their configuration directives side-by-side, build custom JSON log formats for modern observability pipelines, and write parsers in multiple languages. By the end, you will have a complete reference for any log format scenario you encounter.

2. Apache Common Log Format (CLF)

The Common Log Format is the most basic standardized log format. Apache defines it with the following LogFormat directive:

LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog /var/log/apache2/access.log common

A typical CLF entry looks like this:

203.0.113.50 - frank [10/Feb/2025:13:55:36 -0700] "GET /api/v2/users HTTP/1.1" 200 2326

Field-by-Field Breakdown

Field Directive Example Value Description
Remote Host %h 203.0.113.50 Client IP address. Uses DNS hostname if HostnameLookups On
Identity %l - RFC 1413 identity. Almost always a hyphen. Requires mod_ident
User %u frank Authenticated username. Hyphen if no auth
Timestamp %t [10/Feb/2025:13:55:36 -0700] Request time in strftime format [dd/Mon/yyyy:HH:mm:ss zzzzz]
Request Line %r GET /api/v2/users HTTP/1.1 Full first line of request: method, URI, protocol
Status Code %>s 200 Final HTTP status code (after internal redirects)
Bytes Sent %b 2326 Response body size in bytes. Hyphen for zero bytes. Use %B for numeric zero

Warning: The %h directive will perform a DNS reverse lookup if HostnameLookups is enabled, which can significantly slow your server under load. Always keep HostnameLookups Off in production and use %a if you need the client IP behind a proxy.

Status Code Nuance: %s vs %>s

Apache distinguishes between the original status code (%s) and the final status code (%>s). This matters when internal redirects occur:

# Original request returns 301, internal redirect returns 200
# %s  = 301 (original status)
# %>s = 200 (final status after redirect)

# For ErrorDocument handling:
# Request to /missing -> 404 -> ErrorDocument -> 200
# %s  = 404
# %>s = 200

Always use %>s in production log formats unless you specifically need to track pre-redirect status codes for debugging rewrite rules.

3. Apache Combined Log Format

The Combined Log Format extends CLF with two critical fields: Referer and User-Agent. This is the de facto standard for web analytics and the default recommendation for most deployments.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/apache2/access.log combined

Example output:

203.0.113.50 - frank [10/Feb/2025:13:55:36 -0700] "GET /api/v2/users HTTP/1.1" 200 2326 "https://example.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"

The Two Extra Fields

Field Directive Description
Referer %{Referer}i The URL the client came from. Hyphen if direct/bookmarked. Note: the HTTP header misspells "Referrer"
User-Agent %{User-Agent}i Browser or bot identification string. Critical for bot detection and SEO analysis

Custom LogFormat Directives

Apache's mod_log_config supports extensive customization. Here are the most useful directives beyond the defaults:

# Add response time in microseconds
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" combined_with_time

# Add request duration in seconds (requires mod_logio)
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T/%D" timed

# Add SSL protocol and cipher
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %{SSL_PROTOCOL}x %{SSL_CIPHER}x" combined_ssl

# Add X-Forwarded-For for reverse proxy setups
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy_combined

# Add virtual host and server port
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
Directive Description Example Value
%D Request processing time in microseconds 234567
%T Request processing time in seconds 0
%{ms}T Request processing time in milliseconds 234
%I Bytes received (requires mod_logio) 4872
%O Bytes sent including headers (requires mod_logio) 23456
%v Canonical server name www.example.com
%p Server port 443
%X Connection status: X=aborted, +=keepalive, -=closed +
%{VARNAME}e Environment variable Varies
%{Header}i Request header value Varies
%{Header}o Response header value Varies

Conditional Logging

Apache supports conditional logging based on environment variables, which is useful for excluding health checks or internal traffic:

# Don't log health check requests
SetEnvIf Request_URI "^/health$" dontlog
SetEnvIf Request_URI "^/readyz$" dontlog
CustomLog /var/log/apache2/access.log combined env=!dontlog

# Log bots to a separate file
SetEnvIf User-Agent "Googlebot" is_bot
SetEnvIf User-Agent "bingbot" is_bot
SetEnvIf User-Agent "GPTBot" is_bot
CustomLog /var/log/apache2/bot_access.log combined env=is_bot
CustomLog /var/log/apache2/human_access.log combined env=!is_bot

Best Practice: Separating bot traffic into its own log file makes analysis significantly faster. You can monitor Googlebot crawl patterns without filtering through millions of human visitor lines. LogBeast can automatically detect and categorize bot traffic from any log format.

4. Nginx Default Log Format

Nginx defines its default log format in the http block using the log_format directive. The built-in format is called combined and closely mirrors Apache's Combined Log Format:

# Nginx built-in default (you don't need to define this)
log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

access_log /var/log/nginx/access.log combined;
error_log /var/log/nginx/error.log warn;

Example output:

203.0.113.50 - frank [10/Feb/2025:13:55:36 +0000] "GET /api/v2/users HTTP/1.1" 200 2326 "https://example.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

Nginx Variable Reference

Nginx Variable Apache Equivalent Description
$remote_addr %h / %a Client IP address
$remote_user %u Authenticated username
$time_local %t Local time in CLF format
$time_iso8601 %{%Y-%m-%dT%H:%M:%S%z}t ISO 8601 timestamp
$request %r Full request line
$status %>s Response status code
$body_bytes_sent %b Body bytes sent (excludes headers)
$bytes_sent %O Total bytes sent (includes headers)
$http_referer %{Referer}i Referer header
$http_user_agent %{User-Agent}i User-Agent header
$request_time %D (different unit) Request processing time in seconds with ms resolution
$upstream_response_time N/A Time spent waiting for upstream (proxy/fastcgi)
$connection N/A Connection serial number
$connection_requests N/A Number of requests on this connection
$msec N/A Time in seconds with ms resolution at log write
$pipe N/A Pipelined request indicator: p or .

Nginx Error Log Configuration

Unlike access logs, Nginx error logs have a fixed format that cannot be customized. You can only control the severity level:

# Error log levels (from most to least verbose):
# debug, info, notice, warn, error, crit, alert, emerg

error_log /var/log/nginx/error.log warn;

# Per-server block error logs
server {
    listen 443 ssl;
    server_name example.com;
    error_log /var/log/nginx/example.com.error.log error;
}

# Error log format (fixed, cannot be changed):
# 2025/02/10 13:55:36 [error] 1234#1234: *5678 open() "/var/www/html/missing.html" failed (2: No such file or directory), client: 203.0.113.50, server: example.com, request: "GET /missing.html HTTP/1.1", host: "example.com"

Warning: Setting error_log to debug level in production generates enormous volumes of output and measurably impacts performance. Use warn or error for production, and only enable debug temporarily when investigating specific issues.

5. Custom Log Formats: Apache vs Nginx Directives

The real power of web server logging comes from custom formats. Here is a comprehensive side-by-side comparison for achieving the same data capture in both servers.

Complete Directive Comparison Table

Data Point Apache Directive Nginx Variable
Client IP %a $remote_addr
Client IP (behind proxy) %{X-Forwarded-For}i $http_x_forwarded_for
Real Client IP (proxy-aware) %a (with mod_remoteip) $realip_remote_addr (with realip module)
Server hostname %v $server_name
Server port %p $server_port
Request method %m $request_method
Request URI %U $uri
Request URI (original) %U%q $request_uri
Query string %q $args
Protocol %H $server_protocol
Request time (seconds) %T $request_time
Request time (microseconds) %D N/A (use $request_time * 1000000)
Request time (milliseconds) %{ms}T N/A (use $request_time * 1000)
Bytes received %I $request_length
Bytes sent (body only) %b $body_bytes_sent
Bytes sent (total) %O $bytes_sent
SSL protocol %{SSL_PROTOCOL}x $ssl_protocol
SSL cipher %{SSL_CIPHER}x $ssl_cipher
HTTP/2 push %{H2_PUSH}e $http2
Upstream response time %{BALANCER_WORKER_ROUTE}e $upstream_response_time
Upstream status N/A $upstream_status
Upstream address N/A $upstream_addr
GeoIP country %{GEOIP_COUNTRY_CODE}e $geoip_country_code
Connection keep-alive %X $connection_requests
Any request header %{HeaderName}i $http_headername (lowercase, hyphens become underscores)
Any response header %{HeaderName}o $sent_http_headername

Production-Ready Custom Formats

Here are battle-tested custom formats for both servers that include the most useful fields for production analysis:

# Apache - Extended production format
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %{X-Forwarded-For}i %v %{SSL_PROTOCOL}x %X" production

# Usage:
CustomLog /var/log/apache2/access.log production
# Nginx - Extended production format
log_format production '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      '$request_time $http_x_forwarded_for '
                      '$server_name $ssl_protocol '
                      '$upstream_response_time $upstream_status';

access_log /var/log/nginx/access.log production;

Tip: When adding custom fields, always append them to the end of the Combined format rather than inserting them in the middle. This ensures backward compatibility with existing log parsers that expect CLF/Combined field order.

6. JSON Structured Logging

Modern observability stacks (Elasticsearch, Splunk, Datadog, Loki) work best with structured data. JSON log formats eliminate parsing ambiguity, support nested fields, and enable schema evolution without breaking downstream consumers.

Apache JSON Configuration

Apache does not natively output JSON, so you must construct it manually in the LogFormat directive. Special care is needed to escape quotes within field values:

# Apache JSON log format
LogFormat "{\"timestamp\":\"%{%Y-%m-%dT%H:%M:%S%z}t\",\"remote_addr\":\"%a\",\"remote_user\":\"%u\",\"request_method\":\"%m\",\"request_uri\":\"%U%q\",\"protocol\":\"%H\",\"status\":%>s,\"body_bytes_sent\":%B,\"http_referer\":\"%{Referer}i\",\"http_user_agent\":\"%{User-Agent}i\",\"request_time_us\":%D,\"ssl_protocol\":\"%{SSL_PROTOCOL}x\",\"ssl_cipher\":\"%{SSL_CIPHER}x\",\"x_forwarded_for\":\"%{X-Forwarded-For}i\",\"vhost\":\"%v\",\"server_port\":\"%p\"}" json

CustomLog /var/log/apache2/access.json.log json

Example JSON output (formatted for readability):

{
  "timestamp": "2025-02-10T13:55:36+0000",
  "remote_addr": "203.0.113.50",
  "remote_user": "-",
  "request_method": "GET",
  "request_uri": "/api/v2/users?page=2",
  "protocol": "HTTP/1.1",
  "status": 200,
  "body_bytes_sent": 2326,
  "http_referer": "https://example.com/dashboard",
  "http_user_agent": "Mozilla/5.0 ...",
  "request_time_us": 234567,
  "ssl_protocol": "TLSv1.3",
  "ssl_cipher": "TLS_AES_256_GCM_SHA384",
  "x_forwarded_for": "198.51.100.78",
  "vhost": "api.example.com",
  "server_port": "443"
}

Warning: Apache's manual JSON construction is fragile. If a User-Agent string contains an unescaped double quote, it will produce invalid JSON. Consider piping logs through jq for validation, or use a log shipper (Filebeat, Fluentd) that handles JSON encoding properly.

Nginx JSON Configuration

Nginx has the same limitation -- no native JSON output. However, Nginx's escape=json parameter (available since 1.11.8) properly escapes special characters in variable values:

# Nginx JSON log format with proper escaping
log_format json_log escape=json
    '{'
        '"timestamp":"$time_iso8601",'
        '"remote_addr":"$remote_addr",'
        '"remote_user":"$remote_user",'
        '"request_method":"$request_method",'
        '"request_uri":"$request_uri",'
        '"protocol":"$server_protocol",'
        '"status":$status,'
        '"body_bytes_sent":$body_bytes_sent,'
        '"request_length":$request_length,'
        '"http_referer":"$http_referer",'
        '"http_user_agent":"$http_user_agent",'
        '"request_time":$request_time,'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status":"$upstream_status",'
        '"upstream_addr":"$upstream_addr",'
        '"ssl_protocol":"$ssl_protocol",'
        '"ssl_cipher":"$ssl_cipher",'
        '"http_x_forwarded_for":"$http_x_forwarded_for",'
        '"server_name":"$server_name",'
        '"server_port":"$server_port",'
        '"connection":$connection,'
        '"connection_requests":$connection_requests,'
        '"pipe":"$pipe"'
    '}';

access_log /var/log/nginx/access.json.log json_log;

Best Practice: Always use escape=json in Nginx JSON log formats. Without it, user-agent strings, referer URLs, and request URIs containing special characters will produce invalid JSON that breaks your ingestion pipeline. This single directive saves hours of debugging.

JSON Format Comparison

Feature Apache JSON Nginx JSON
Native JSON support No No
Auto-escaping No (manual) Yes (escape=json)
Numeric types Manual (omit quotes) Manual (omit quotes)
Nested objects Not supported Not supported
Array values Not supported Not supported
ISO 8601 timestamps %{%Y-%m-%dT%H:%M:%S%z}t $time_iso8601
Upstream metrics Limited Comprehensive
Broken JSON risk High Low (with escape=json)

7. Log Rotation and Management

Without rotation, web server logs grow unbounded. A busy site generating 10,000 requests per minute produces roughly 3 GB of Combined format logs per day. Proper rotation ensures you retain useful history without exhausting disk space.

Apache Logrotate Configuration

# /etc/logrotate.d/apache2
/var/log/apache2/*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    postrotate
        if invoke-rc.d apache2 status > /dev/null 2>&1; then
            invoke-rc.d apache2 reload > /dev/null
        fi
    endscript
}

# For JSON logs with different retention
/var/log/apache2/*.json.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    postrotate
        if invoke-rc.d apache2 status > /dev/null 2>&1; then
            invoke-rc.d apache2 reload > /dev/null
        fi
    endscript
}

Nginx Logrotate Configuration

# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    prerotate
        if [ -d /etc/logrotate.d/httpd-prerotate ]; then
            run-parts /etc/logrotate.d/httpd-prerotate
        fi
    endscript
    postrotate
        invoke-rc.d nginx rotate >/dev/null 2>&1
    endscript
}

Key Difference: Apache requires a full reload (graceful restart) to reopen log files after rotation. Nginx supports a dedicated rotate signal (USR1) that reopens log files without any service interruption. This makes Nginx log rotation zero-downtime by default.

Signal-Based Rotation

# Manual rotation with signals

# Apache - graceful restart to reopen logs
sudo apachectl graceful
# or
sudo kill -USR1 $(cat /var/run/apache2/apache2.pid)

# Nginx - reopen log files (zero downtime)
sudo nginx -s reopen
# or
sudo kill -USR1 $(cat /var/run/nginx.pid)

Disk Space Estimation

Requests/day CLF (~150 bytes/line) Combined (~350 bytes/line) JSON (~600 bytes/line)
100,000 ~14 MB ~33 MB ~57 MB
1,000,000 ~143 MB ~333 MB ~572 MB
10,000,000 ~1.4 GB ~3.3 GB ~5.7 GB
100,000,000 ~14 GB ~33 GB ~57 GB

With gzip compression (typical for logrotate), expect 85-95% size reduction. A 3.3 GB daily Combined log compresses to roughly 250-500 MB.

8. Parsing Log Files

Raw log files are only useful if you can parse them. This section provides production-grade patterns for extracting structured data from both Apache and Nginx logs.

Regex Patterns

The Combined Log Format regex works for both Apache and Nginx since they use the same output format:

# Combined Log Format regex (PCRE)
^(?P<ip>\S+) \S+ (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d{3}) (?P<bytes>\S+) "(?P<referer>[^"]*)" "(?P<useragent>[^"]*)"

# Common Log Format regex
^(?P<ip>\S+) \S+ (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d{3}) (?P<bytes>\S+)

# Handle malformed requests (missing method/path/protocol)
^(?P<ip>\S+) \S+ (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<request>[^"]*)" (?P<status>\d{3}) (?P<bytes>\S+) "(?P<referer>[^"]*)" "(?P<useragent>[^"]*)"

AWK One-Liners for Quick Analysis

# Top 20 IP addresses
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

# Top 20 requested URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20

# Status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn

# Requests per hour
awk '{print substr($4,2,14)}' access.log | sort | uniq -c

# Total bandwidth in MB
awk '{sum+=$10} END {printf "%.2f MB\n", sum/1024/1024}' access.log

# Average response size per status code
awk '{status[$9]++; bytes[$9]+=$10} END {for (s in status) printf "%s: %d requests, avg %.0f bytes\n", s, status[s], bytes[s]/status[s]}' access.log

# Find all Googlebot requests
awk -F'"' '$6 ~ /Googlebot/ {print $2}' access.log | sort | uniq -c | sort -rn | head -20

# Requests per minute (for spike detection)
awk '{print substr($4,2,17)}' access.log | sort | uniq -c | sort -rn | head -20

# 5xx errors with full details
awk '$9 ~ /^5/ {print $0}' access.log | tail -50

# Slow requests (if request time is the last field, in microseconds)
awk '{if ($NF > 1000000) print $0}' access.log | head -20

Python Log Parser

#!/usr/bin/env python3
"""
Production-grade log parser for Apache/Nginx Combined Log Format.
Handles malformed lines, compressed files, and streaming input.
"""

import re
import gzip
import sys
from datetime import datetime
from collections import defaultdict, Counter

# Compiled regex for performance
COMBINED_RE = re.compile(
    r'^(?P<ip>\S+) \S+ (?P<user>\S+) '
    r'\[(?P<timestamp>[^\]]+)\] '
    r'"(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" '
    r'(?P<status>\d{3}) (?P<bytes>\S+) '
    r'"(?P<referer>[^"]*)" '
    r'"(?P<useragent>[^"]*)"'
)

# Fallback for malformed request lines
FALLBACK_RE = re.compile(
    r'^(?P<ip>\S+) \S+ (?P<user>\S+) '
    r'\[(?P<timestamp>[^\]]+)\] '
    r'"(?P<request>[^"]*)" '
    r'(?P<status>\d{3}) (?P<bytes>\S+)'
)


def parse_line(line):
    """Parse a single log line, returning a dict or None."""
    match = COMBINED_RE.match(line)
    if match:
        d = match.groupdict()
        d['bytes'] = 0 if d['bytes'] == '-' else int(d['bytes'])
        d['status'] = int(d['status'])
        return d

    match = FALLBACK_RE.match(line)
    if match:
        d = match.groupdict()
        d['bytes'] = 0 if d['bytes'] == '-' else int(d['bytes'])
        d['status'] = int(d['status'])
        d['method'] = d['path'] = d['protocol'] = None
        d['referer'] = d['useragent'] = None
        return d

    return None


def open_log(filepath):
    """Open plain or gzipped log files."""
    if filepath.endswith('.gz'):
        return gzip.open(filepath, 'rt', encoding='utf-8', errors='replace')
    return open(filepath, 'r', encoding='utf-8', errors='replace')


def analyze_log(filepath):
    """Analyze a log file and print summary statistics."""
    stats = {
        'total': 0,
        'parsed': 0,
        'failed': 0,
        'status_codes': Counter(),
        'top_ips': Counter(),
        'top_paths': Counter(),
        'top_agents': Counter(),
        'bytes_total': 0,
        'hourly': Counter(),
    }

    with open_log(filepath) as f:
        for line in f:
            stats['total'] += 1
            entry = parse_line(line.strip())

            if entry is None:
                stats['failed'] += 1
                continue

            stats['parsed'] += 1
            stats['status_codes'][entry['status']] += 1
            stats['top_ips'][entry['ip']] += 1
            stats['bytes_total'] += entry['bytes']

            if entry.get('path'):
                stats['top_paths'][entry['path']] += 1
            if entry.get('useragent'):
                stats['top_agents'][entry['useragent']] += 1

    # Print report
    print(f"\n{'='*60}")
    print(f"Log Analysis: {filepath}")
    print(f"{'='*60}")
    print(f"Total lines:  {stats['total']:,}")
    print(f"Parsed:       {stats['parsed']:,}")
    print(f"Failed:       {stats['failed']:,}")
    print(f"Total bytes:  {stats['bytes_total']/1024/1024:.2f} MB")
    print(f"\nStatus codes:")
    for code, count in stats['status_codes'].most_common():
        print(f"  {code}: {count:,}")
    print(f"\nTop 10 IPs:")
    for ip, count in stats['top_ips'].most_common(10):
        print(f"  {ip}: {count:,}")
    print(f"\nTop 10 paths:")
    for path, count in stats['top_paths'].most_common(10):
        print(f"  {path}: {count:,}")


if __name__ == '__main__':
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <logfile> [logfile2 ...]")
        sys.exit(1)
    for filepath in sys.argv[1:]:
        analyze_log(filepath)

Parsing JSON Logs

If you have already configured JSON output, parsing becomes trivially simple:

# jq - extract all 5xx errors
cat access.json.log | jq -r 'select(.status >= 500) | "\(.timestamp) \(.status) \(.request_uri) \(.upstream_response_time)"'

# jq - top IPs by request count
cat access.json.log | jq -r '.remote_addr' | sort | uniq -c | sort -rn | head -20

# jq - average response time per endpoint
cat access.json.log | jq -r '"\(.request_uri) \(.request_time)"' | \
    awk '{sum[$1]+=$2; count[$1]++} END {for (u in sum) printf "%s %.3f (%d reqs)\n", u, sum[u]/count[u], count[u]}' | \
    sort -k2 -rn | head -20

# Python one-liner for JSON logs
python3 -c "
import json, sys
from collections import Counter
c = Counter()
for line in sys.stdin:
    try:
        e = json.loads(line)
        c[e['status']] += 1
    except: pass
for k,v in c.most_common(): print(f'{k}: {v}')
" < access.json.log

Pro Tip: JSON logs eliminate the need for complex regex parsing entirely. If your analysis pipeline supports it, the small disk space overhead (roughly 70% larger than Combined format) is well worth the parsing simplicity and reliability. LogBeast natively supports both Combined and JSON log formats with automatic format detection.

9. Which Format to Choose

The right format depends on your infrastructure, team, and use case. Use this decision matrix to guide your choice:

Decision Matrix

Criteria CLF Combined Custom Extended JSON
Disk usage Lowest Medium Medium-High Highest
Parse complexity Simple regex Moderate regex Complex regex Trivial (native JSON)
Tool compatibility Universal Universal Custom parsers needed Modern tools only
Bot/SEO analysis No (no User-Agent) Yes Yes Yes
Performance debugging No (no timing) No (no timing) Yes Yes
ELK/Splunk/Datadog Supported Supported Grok patterns needed Native ingest
Schema evolution Rigid Rigid Version carefully Add fields freely
Human readability Good Good Moderate Verbose but clear
Malformed line risk Low Medium (UA strings) Medium Low (with escape=json)

Recommendations by Use Case

Small sites (< 100K requests/day): Use Combined format. It provides the best balance of information and simplicity. Every tool supports it natively, and disk usage is negligible at this scale.

Medium sites (100K - 10M requests/day): Use Custom Extended format with request timing. Performance data becomes critical at this scale. Consider running a Combined log alongside for compatibility.

Large sites (> 10M requests/day): Use JSON format piped directly to your observability platform. The parsing efficiency gain at this volume justifies the disk overhead. Use log sampling if storage is a constraint.

Microservices / Kubernetes: Use JSON format exclusively. Container-based environments use stdout/stderr for logging, and JSON integrates natively with Fluentd, Fluent Bit, and other log collectors in the ecosystem.

SEO-focused analysis: Use Combined or Custom Extended format with the User-Agent field. Bot detection and crawl analysis require the User-Agent string at minimum. Add request timing to track how fast pages are served to Googlebot.

Hybrid Approach: Many production deployments write two log files simultaneously -- a Combined format log for backward compatibility and ad-hoc analysis, plus a JSON format log for pipeline ingestion. Both Apache and Nginx support multiple CustomLog/access_log directives pointing to different files with different formats.

# Apache - Dual logging
CustomLog /var/log/apache2/access.log combined
CustomLog /var/log/apache2/access.json.log json

# Nginx - Dual logging
access_log /var/log/nginx/access.log combined;
access_log /var/log/nginx/access.json.log json_log;

10. Conclusion

Apache and Nginx share a common heritage in the NCSA Common Log Format, but their configuration syntax and available variables differ significantly. The key takeaways from this guide:

Whatever format you choose, the most important step is to actually analyze your logs regularly. The best log format in the world is useless if no one reads the data.

Next Step: Ready to analyze your Apache and Nginx logs without writing parsers? LogBeast automatically detects CLF, Combined, Custom, and JSON log formats from both servers. Import your logs and get instant insights into traffic patterns, bot behavior, and performance metrics.

See it in action with GetBeast tools

Analyze your own server logs and crawl your websites with our professional desktop tools.

Try LogBeast Free Try CrawlBeast Free