SEO Site Migration Checklist: Zero Traffic Loss with Log Analysis

📑 Table of Contents

Introduction: Why Migrations Fail
Pre-Migration SEO Audit
URL Mapping and Redirect Strategy
Server Log Baseline Before Migration
The Migration Day Checklist
Post-Migration Log Monitoring
Detecting Lost Pages with Log Analysis
Fixing Redirect Chains and Loops
Traffic Recovery Timeline
Common Migration Disasters and How to Fix Them
Conclusion

Introduction: Why Migrations Fail

Site migrations are the single highest-risk event in an SEO program. Whether you are changing domains, switching CMS platforms, restructuring URLs, or moving to HTTPS, every migration carries the potential for catastrophic traffic loss. Industry data shows that over 60% of site migrations result in measurable organic traffic drops, and many never fully recover.

The failures are almost always preventable. They happen because teams treat migration as a development project rather than an SEO project. Redirects get missed. Canonical tags point to old URLs. Internal links break. And critically, nobody monitors server logs to catch the problems before Google notices them.

🔑 Key Insight: Server logs are the single most important tool for migration success. They tell you exactly what Googlebot is crawling before, during, and after migration. Without log data, you are flying blind and will not know something is broken until traffic has already dropped.

This guide walks through a complete, battle-tested migration checklist built around server log analysis. Whether you are moving 500 pages or 5 million, the methodology is the same: establish baselines, map everything, execute cleanly, and monitor obsessively. Tools like LogBeast make the log analysis steps dramatically faster, but every technique here can be executed with standard command-line tools.

Pre-Migration SEO Audit

Before you touch a single URL, you need to document exactly what you have. This baseline becomes your reference point for validating that nothing was lost during migration.

Crawl Baseline

Run a full crawl of the current site and record every URL, its status code, title tag, meta description, canonical tag, and internal link count. This is your source of truth for what exists today.

Total indexable URLs: How many pages return 200 status codes and are not blocked by robots.txt or noindex
URL structure patterns: Document every URL pattern (e.g., /product/{slug}/, /category/{name}/page/{n}/)
Canonical tag mapping: Which pages have self-referencing canonicals vs. cross-domain canonicals
Hreflang configuration: If you have international versions, map every hreflang relationship
Structured data: Document all schema markup types and which page templates use them

💡 Pro Tip: Use CrawlBeast to run a complete pre-migration crawl. Export the full URL list with status codes, canonicals, and meta data. This becomes your redirect mapping source file and your post-migration validation checklist.

Log Baseline

Crawl data tells you what exists. Log data tells you what matters. A page that Googlebot has not visited in 90 days is far less critical than a page it crawls every day.

# Extract all unique URLs that Googlebot has crawled in the last 30 days
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print $7}' | sort -u > googlebot_crawled_urls.txt

# Count Googlebot crawl frequency per URL
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print $7}' | sort | uniq -c | sort -rn > googlebot_frequency.txt

# Identify your most-crawled pages (these are your highest priority for redirects)
head -50 googlebot_frequency.txt

Indexation Snapshot

Before migration, capture your current indexation state from Google Search Console:

Metric	Where to Find It	Why It Matters
Total indexed pages	GSC > Pages > Indexed	Baseline to compare post-migration
Top performing pages	GSC > Performance > Pages	These pages must have perfect redirects
Sitemaps status	GSC > Sitemaps	Verify all sitemaps are submitted and processed
Crawl stats	GSC > Settings > Crawl stats	Baseline crawl rate for comparison
Core Web Vitals	GSC > Core Web Vitals	Ensure new site does not regress on performance
Manual actions	GSC > Manual actions	Clear any existing issues before migration

⚠️ Warning: Do NOT rely solely on site: operator counts. They are estimates and fluctuate wildly. Use GSC's indexation report for accurate numbers, and cross-reference with your server log data for the most complete picture.

URL Mapping and Redirect Strategy

The redirect map is the single most critical deliverable of any migration. Every old URL must map to the most relevant new URL. There are no shortcuts here -- incomplete redirect maps are the number one cause of migration traffic loss.

301 vs 302 Redirects

For migrations, always use 301 (permanent) redirects. This tells search engines the move is permanent and that link equity should transfer to the new URL.

Redirect Type	When to Use	Equity Transfer	Migration Use
301 Permanent	URL has permanently moved	Yes (full)	Primary choice for all migration redirects
302 Temporary	URL will return to original location	No (Google holds equity at old URL)	Never use for migrations
308 Permanent	Same as 301, preserves HTTP method	Yes (full)	Use for API endpoints that must preserve POST/PUT
Meta refresh	When server redirects are not possible	Partial	Last resort only

Building the Redirect Map

Combine your crawl data and log data to build a comprehensive redirect map. Prioritize by traffic and crawl frequency:

# Step 1: Merge crawled URLs with Googlebot crawl frequency
# crawled_urls.csv has columns: url, status, title
# googlebot_frequency.txt has columns: count, url
awk 'NR==FNR {freq[$2]=$1; next} {print $0 "," (freq[$1] ? freq[$1] : 0)}' \
  googlebot_frequency.txt crawled_urls.csv > urls_with_priority.csv

# Step 2: Sort by priority (Googlebot frequency)
sort -t',' -k4 -rn urls_with_priority.csv > urls_prioritized.csv

# Step 3: Generate redirect map template
awk -F',' '{print $1 "," "NEW_URL_HERE" "," $4}' urls_prioritized.csv > redirect_map.csv

Regex Redirects for Pattern-Based Migrations

When URL structures change systematically, regex redirects handle thousands of URLs with a few rules. Here are common patterns:

# Nginx: Redirect old product URLs to new structure
# Old: /products/widget-123.html
# New: /shop/widget-123/
location ~ ^/products/(.+)\.html$ {
    return 301 /shop/$1/;
}

# Nginx: Redirect old blog date-based URLs to slug-only
# Old: /blog/2024/03/my-post-title/
# New: /blog/my-post-title/
location ~ ^/blog/\d{4}/\d{2}/(.+)$ {
    return 301 /blog/$1;
}

# Nginx: Domain migration (old domain to new domain)
server {
    server_name olddomain.com www.olddomain.com;
    return 301 https://newdomain.com$request_uri;
}

# Apache: Equivalent regex redirects in .htaccess
RewriteEngine On
# Product URL restructure
RewriteRule ^products/(.+)\.html$ /shop/$1/ [R=301,L]
# Blog date removal
RewriteRule ^blog/\d{4}/\d{2}/(.+)$ /blog/$1 [R=301,L]
# Domain migration
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteRule ^(.*)$ https://newdomain.com/$1 [R=301,L]

🔑 Key Insight: Test every regex redirect rule against your full URL list before going live. A single misplaced capture group can redirect thousands of pages to the wrong destination. Write a script that applies each rule to your URL list and verify the output matches your redirect map.

Server Log Baseline Before Migration

In the 2-4 weeks before migration, capture detailed log baselines. These numbers become your "before" snapshot for detecting problems after launch.

What to Capture

# Daily Googlebot request volume
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print substr($4, 2, 11)}' | sort | uniq -c

# Googlebot status code distribution
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print $9}' | sort | uniq -c | sort -rn

# Googlebot crawl rate per hour (for detecting crawl rate changes)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print substr($4, 2, 14)}' | sort | uniq -c

# Top 100 most-crawled URLs by Googlebot
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print $7}' | sort | uniq -c | sort -rn | head -100

# Current 404 rate (baseline for comparison)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 404 {print $7}' | sort | uniq -c | sort -rn | head -50

Googlebot Crawl Patterns

Understanding Googlebot's pre-migration crawl behavior helps you set expectations for post-migration recovery:

Pattern	What to Measure	Healthy Range
Daily crawl volume	Total Googlebot requests per day	Varies by site size; note your average
Crawl frequency per URL	How often Googlebot revisits key pages	Homepage: daily; key pages: weekly
200 response ratio	% of Googlebot requests returning 200	> 85%
Crawl distribution	Which sections Googlebot prioritizes	Should align with your important content
Response time	Average server response time to Googlebot	< 500ms for HTML pages

💡 Pro Tip: LogBeast generates all of these baseline metrics automatically from your log files. Export the pre-migration report and keep it as your reference document. After migration, run the same analysis on new logs and compare side by side.

The Migration Day Checklist

Migration day should be boring. If you have done the preparation correctly, it is a mechanical execution of a well-rehearsed plan. Here is the step-by-step checklist:

Step	Action	Verification	Rollback Trigger
1	Deploy new site to staging and run full crawl	All URLs return 200; no broken internal links	Any critical page missing or broken
2	Implement all 301 redirects on old URLs	Test 100% of redirect map with curl	Redirect coverage below 95%
3	Update DNS / deploy to production	New site is live and accessible	DNS propagation failures
4	Verify robots.txt on new site	No accidental disallow rules blocking content	Robots.txt blocks Googlebot
5	Submit updated XML sitemaps	Sitemaps reference new URLs only	Sitemaps contain old URLs
6	Verify canonical tags point to new URLs	No canonical tags pointing to old domain	Canonicals referencing old URLs
7	Update internal links to new URL structure	Crawl finds no internal links to old URLs	More than 5% broken internal links
8	Verify hreflang tags (if applicable)	All hreflang URLs resolve and are reciprocal	Broken hreflang relationships
9	Start server log monitoring	Googlebot is receiving 301s and crawling new URLs	Googlebot getting 404s or 500s
10	Add new domain to Google Search Console	Ownership verified; change of address submitted	N/A (do this regardless)

⚠️ Warning: Never migrate on a Friday. If something goes wrong, you need business days to respond. Tuesday and Wednesday are the safest migration days because you have the rest of the week to monitor and fix issues.

Post-Migration Log Monitoring

The first 48-72 hours after migration are critical. Googlebot will begin hitting old URLs and following redirects. Your server logs will tell you immediately whether the migration is working or failing.

What to Watch: 404 Spikes

A spike in 404 responses from Googlebot means redirects are missing. This is the most common and most damaging migration failure.

# Monitor Googlebot 404s in real-time
tail -f /var/log/nginx/access.log | grep "Googlebot" | awk '$9 == 404 {print $7}'

# Count Googlebot 404s per hour (compare to baseline)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 404 {print substr($4, 2, 14)}' | sort | uniq -c

# List the most common 404 URLs hit by Googlebot
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 404 {print $7}' | sort | uniq -c | sort -rn | head -50

What to Watch: Crawl Rate Drops

If Googlebot's crawl rate drops significantly after migration, it may indicate server performance issues, robots.txt blocks, or loss of trust.

# Compare daily Googlebot request volume (pre vs post migration)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print substr($4, 2, 11)}' | sort | uniq -c

# Monitor response times to Googlebot (slow responses reduce crawl rate)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print $NF}' | sort -n | \
  awk '{a[NR]=$1} END {print "Median:", a[int(NR/2)], "P95:", a[int(NR*0.95)], "P99:", a[int(NR*0.99)]}'

What to Watch: Redirect Chains

Redirect chains (A -> B -> C) waste crawl budget and dilute link equity. They commonly appear during migration when old redirects stack on top of new ones.

# Find redirect chains by checking where Googlebot 301s lead
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 301 {print $7}' | sort | uniq -c | sort -rn | head -30

# Test for redirect chains using curl
while read url; do
    chain=$(curl -sIL -o /dev/null -w "%{num_redirects}" "https://example.com${url}")
    if [ "$chain" -gt 1 ]; then
        echo "CHAIN ($chain hops): $url"
    fi
done < redirect_urls.txt

🔑 Key Insight: Set up automated alerts for these three signals. A simple cron job that checks Googlebot 404 count and crawl volume every hour and sends an email alert if either deviates more than 30% from baseline can save your migration. For detailed monitoring, check our crawl budget optimization guide.

Detecting Lost Pages with Log Analysis

After migration, some pages inevitably fall through the cracks. They had redirects in the map but something went wrong in implementation, or they were missed entirely. Log analysis catches these before Google deindexes them.

Finding Orphaned URLs

Orphaned URLs are old pages that Googlebot is still trying to crawl but are returning 404s instead of 301 redirects:

# Find all unique URLs returning 404 to Googlebot post-migration
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 404 {print $7}' | sort -u > orphaned_urls.txt

# Cross-reference with your redirect map to find what was missed
comm -23 orphaned_urls.txt redirect_map_urls.txt > missing_redirects.txt

# Count how many times each orphaned URL was requested (priority indicator)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 404 {print $7}' | sort | uniq -c | sort -rn > orphaned_priority.txt

# Find orphaned URLs that had high traffic in the pre-migration period
# (These are the most critical to fix)
while read count url; do
    pre_count=$(grep "$url" pre_migration_googlebot_frequency.txt | awk '{print $1}')
    if [ -n "$pre_count" ] && [ "$pre_count" -gt 10 ]; then
        echo "HIGH PRIORITY: $url (pre-migration: $pre_count crawls, now: 404)"
    fi
done < orphaned_priority.txt

Detecting Soft 404s

Soft 404s are pages that return a 200 status code but display an error message or empty content. These are invisible to simple status code monitoring:

# Find suspiciously small responses (potential soft 404s)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 200 && $10 < 1000 {print $7, $10}' | sort -k2 -n | head -50

# Compare response sizes pre vs post migration for the same URLs
# If a page went from 15KB to 500 bytes, it is likely a soft 404
while read url; do
    new_size=$(grep "Googlebot" /var/log/nginx/access.log | grep "$url" | awk '{print $10}' | tail -1)
    old_size=$(grep "Googlebot" /var/log/nginx/access.log.old | grep "$url" | awk '{print $10}' | tail -1)
    if [ -n "$old_size" ] && [ -n "$new_size" ]; then
        ratio=$((new_size * 100 / old_size))
        if [ "$ratio" -lt 20 ]; then
            echo "SOFT 404 SUSPECT: $url (was ${old_size}B, now ${new_size}B)"
        fi
    fi
done < top_pages.txt

⚠️ Warning: Soft 404s are more dangerous than real 404s because they are harder to detect. Google's crawler can identify many soft 404s and will eventually deindex those pages, but the process is slower and more unpredictable than a clean 301 redirect.

Fixing Redirect Chains and Loops

Redirect chains occur when one redirect leads to another, creating a series of hops. Redirect loops occur when URL A redirects to URL B, which redirects back to URL A. Both are common during migrations, especially when old redirect rules were not cleaned up before adding new ones.

Detecting Chains and Loops

#!/bin/bash
# detect_redirect_issues.sh - Find chains and loops in your redirects
# Usage: ./detect_redirect_issues.sh urls.txt

while read url; do
    result=$(curl -sIL -o /dev/null -w "%{http_code} %{num_redirects} %{url_effective}" \
        --max-redirs 10 "$url" 2>/dev/null)

    code=$(echo "$result" | awk '{print $1}')
    hops=$(echo "$result" | awk '{print $2}')
    final=$(echo "$result" | awk '{print $3}')

    if [ "$hops" -gt 1 ]; then
        echo "CHAIN: $url -> $final ($hops hops)"
    fi

    if [ "$code" -eq 0 ] || [ "$hops" -ge 10 ]; then
        echo "LOOP: $url (max redirects reached)"
    fi

    if [ "$code" -eq 404 ]; then
        echo "BROKEN: $url -> $final (ends in 404)"
    fi
done < "$1"

# Run a full chain trace for a specific URL
curl -sIL "https://example.com/old-page/" 2>&1 | grep -E "^(HTTP/|Location:)"

Nginx: Fixing Redirect Chains

# BAD: This creates a chain (old -> intermediate -> final)
location /old-page/ {
    return 301 /intermediate-page/;
}
location /intermediate-page/ {
    return 301 /final-page/;
}

# GOOD: Point directly to the final destination
location /old-page/ {
    return 301 /final-page/;
}
location /intermediate-page/ {
    return 301 /final-page/;
}

# Use a map block for large-scale redirect cleanup
map $request_uri $redirect_target {
    /old-page-1/      /new-page-1/;
    /old-page-2/      /new-page-2/;
    /products/old/     /shop/new/;
    # Add all redirects here -- flat, no chains
}

server {
    if ($redirect_target) {
        return 301 $redirect_target;
    }
}

Apache: Fixing Redirect Chains

# Ensure redirect rules are ordered correctly in .htaccess
# Process the most specific rules first

RewriteEngine On

# Direct redirects (no chains)
RewriteRule ^old-page-1/?$ /new-page-1/ [R=301,L]
RewriteRule ^old-page-2/?$ /new-page-2/ [R=301,L]

# Pattern-based redirects (catch remaining old URLs)
RewriteRule ^products/(.+)\.html$ /shop/$1/ [R=301,L]

# The [L] flag is critical -- it stops processing after the first match
# Without [L], Apache may apply multiple rules creating chains

💡 Pro Tip: After fixing redirect chains, verify the fix by re-crawling the affected URLs with CrawlBeast. Set the crawler to follow redirects and report the full chain. Any URL with more than one redirect hop still needs attention. See our redirect chains guide for more details.

Traffic Recovery Timeline

Even a perfectly executed migration will see some temporary fluctuation in organic traffic. Understanding the normal recovery timeline helps you distinguish between expected behavior and actual problems.

Timeframe	What to Expect	Action if Not Recovering
Week 1	10-30% traffic fluctuation; Googlebot discovering redirects; crawl rate may spike as Google follows 301s	Check for 404 spikes and missing redirects in logs
Week 2	Traffic stabilizing; Google starting to index new URLs; old URLs being removed from index	Verify new URLs are appearing in GSC index report
Week 3-4	Traffic returning to 80-95% of pre-migration levels; most new URLs indexed	Audit pages with traffic drops; check canonical and redirect issues
Month 2	Traffic at or above pre-migration levels; ranking positions stabilizing	Deep-dive into remaining underperforming pages
Month 3-6	Full recovery; link equity fully transferred; rankings stable	If still down, investigate link equity loss and content parity issues

🔑 Key Insight: The "Google dance" during weeks 1-2 is normal and expected. Do not panic and start making changes during this period unless you see clear errors in your server logs (such as mass 404s or redirect loops). Unnecessary changes during the settling period can make things worse.

Monitoring Recovery with Logs

# Track daily Googlebot crawl volume trend (should return to baseline within 2-4 weeks)
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print substr($4, 2, 11)}' | sort | uniq -c | \
  awk '{print $2, $1}' > crawl_trend.tsv

# Track the ratio of 200s vs 301s vs 404s from Googlebot over time
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{date=substr($4, 2, 11); code=$9; counts[date][code]++}
  END {for (d in counts) {
    total=0; for (c in counts[d]) total+=counts[d][c];
    printf "%s\t200: %d (%.0f%%)\t301: %d (%.0f%%)\t404: %d (%.0f%%)\n",
      d, counts[d][200], counts[d][200]/total*100,
      counts[d][301], counts[d][301]/total*100,
      counts[d][404], counts[d][404]/total*100
  }}' | sort

Common Migration Disasters and How to Fix Them

Even well-planned migrations can go wrong. Here are the most common disasters, how to detect them in your logs, and how to fix them fast.

Disaster 1: Robots.txt Blocking Googlebot

A new robots.txt that accidentally blocks Googlebot from critical sections. This happens more than you would think, especially when staging robots.txt rules get deployed to production.

# Detect: Googlebot stops crawling entire sections
grep "Googlebot" /var/log/nginx/access.log | \
  awk '{print $7}' | awk -F'/' '{print "/" $2 "/"}' | sort | uniq -c | sort -rn

# If a previously-active section shows zero requests, check robots.txt immediately
curl -s https://example.com/robots.txt

# Fix: Update robots.txt and request re-crawl
# Also submit the updated robots.txt via GSC

Disaster 2: Mass 302s Instead of 301s

Using 302 (temporary) redirects instead of 301 (permanent) redirects. Google will not transfer link equity for 302s, and your rankings will tank.

# Detect: Check redirect status codes in logs
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 302 {print $7}' | wc -l

grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 301 {print $7}' | wc -l

# If 302 count is high and 301 count is low, you have a problem
# Fix: Change all 302s to 301s in your server configuration

Disaster 3: Canonical Tags Pointing to Old Domain

New site pages have canonical tags still referencing the old domain or old URL structure. This tells Google to ignore the new pages and keep indexing the old ones (which are now redirecting).

# Detect: Crawl the new site and extract canonical tags
curl -s https://newdomain.com/ | grep -i "canonical"

# At scale, use CrawlBeast or a script
while read url; do
    canonical=$(curl -sL "$url" | grep -oP '(?<=rel="canonical" href=")[^"]+')
    if echo "$canonical" | grep -q "olddomain"; then
        echo "BAD CANONICAL: $url -> $canonical"
    fi
done < new_site_urls.txt

Disaster 4: Internal Links Still Pointing to Old URLs

The new site's navigation, footer, or content links still reference old URLs. This creates unnecessary redirect hops for both users and crawlers and wastes crawl budget.

# Detect: Look for 301s from internal page loads (not initial Googlebot discovery)
# Internal redirect chains show up as high-volume 301 URLs
grep "Googlebot" /var/log/nginx/access.log | \
  awk '$9 == 301 {print $7}' | sort | uniq -c | sort -rn | head -20

# If the same old URLs are being 301'd repeatedly, internal links are the cause
# Fix: Update all templates, navigation, and content to use new URLs

Disaster 5: Sitemap Still Lists Old URLs

The XML sitemap submitted to Google still contains old URLs or includes URLs that 301 redirect. This confuses Google about which URLs are the canonical versions.

# Detect: Download and check your sitemap
curl -s https://example.com/sitemap.xml | grep -oP '(?<=)[^<]+' | head -20

# Check for old domain references in sitemap
curl -s https://example.com/sitemap.xml | grep -c "olddomain"

# Verify all sitemap URLs return 200
curl -s https://example.com/sitemap.xml | grep -oP '(?<=)[^<]+' | \
  while read url; do
    code=$(curl -sI -o /dev/null -w "%{http_code}" "$url")
    if [ "$code" != "200" ]; then
        echo "$code $url"
    fi
  done

⚠️ Warning: If you discover any of these disasters, fix them immediately. Every hour that Googlebot crawls broken redirects or hits 404s is an hour of lost indexation signals. Use LogBeast to set up real-time alerts so you catch these issues within minutes, not days.

Conclusion

Site migrations do not have to be terrifying. The difference between a traffic-preserving migration and a traffic-destroying one comes down to preparation, execution, and monitoring. Server logs are your best friend through all three phases.

The key takeaways from this guide:

Build baselines before you migrate. Crawl data, log data, and indexation data give you a reference point for measuring success
The redirect map is everything. Every old URL must map to the most relevant new URL with a 301 redirect. No exceptions
Monitor logs obsessively post-migration. 404 spikes, crawl rate drops, and redirect chains are all visible in server logs within hours
Expect a recovery timeline. A 2-4 week fluctuation is normal. Panic-driven changes during the settling period make things worse
Automate your monitoring. Use tools like LogBeast to continuously analyze logs and alert you to problems before they impact rankings

Start your migration preparation today by capturing your log baselines. Run the commands in this guide against your server logs to understand Googlebot's current crawl behavior, and build your redirect map from real data rather than assumptions.

🎯 Next Steps: Read our guide on reducing 404 errors with log analysis for more on finding and fixing broken URLs, and check out crawl budget optimization to ensure Googlebot spends its crawl budget on your most important pages after migration.