📑 Table of Contents
- Introduction: Why Migrations Fail
- Pre-Migration SEO Audit
- URL Mapping and Redirect Strategy
- Server Log Baseline Before Migration
- The Migration Day Checklist
- Post-Migration Log Monitoring
- Detecting Lost Pages with Log Analysis
- Fixing Redirect Chains and Loops
- Traffic Recovery Timeline
- Common Migration Disasters and How to Fix Them
- Conclusion
Introduction: Why Migrations Fail
Site migrations are the single highest-risk event in an SEO program. Whether you are changing domains, switching CMS platforms, restructuring URLs, or moving to HTTPS, every migration carries the potential for catastrophic traffic loss. Industry data shows that over 60% of site migrations result in measurable organic traffic drops, and many never fully recover.
The failures are almost always preventable. They happen because teams treat migration as a development project rather than an SEO project. Redirects get missed. Canonical tags point to old URLs. Internal links break. And critically, nobody monitors server logs to catch the problems before Google notices them.
🔑 Key Insight: Server logs are the single most important tool for migration success. They tell you exactly what Googlebot is crawling before, during, and after migration. Without log data, you are flying blind and will not know something is broken until traffic has already dropped.
This guide walks through a complete, battle-tested migration checklist built around server log analysis. Whether you are moving 500 pages or 5 million, the methodology is the same: establish baselines, map everything, execute cleanly, and monitor obsessively. Tools like LogBeast make the log analysis steps dramatically faster, but every technique here can be executed with standard command-line tools.
Pre-Migration SEO Audit
Before you touch a single URL, you need to document exactly what you have. This baseline becomes your reference point for validating that nothing was lost during migration.
Crawl Baseline
Run a full crawl of the current site and record every URL, its status code, title tag, meta description, canonical tag, and internal link count. This is your source of truth for what exists today.
- Total indexable URLs: How many pages return 200 status codes and are not blocked by robots.txt or noindex
- URL structure patterns: Document every URL pattern (e.g.,
/product/{slug}/,/category/{name}/page/{n}/) - Canonical tag mapping: Which pages have self-referencing canonicals vs. cross-domain canonicals
- Hreflang configuration: If you have international versions, map every hreflang relationship
- Structured data: Document all schema markup types and which page templates use them
💡 Pro Tip: Use CrawlBeast to run a complete pre-migration crawl. Export the full URL list with status codes, canonicals, and meta data. This becomes your redirect mapping source file and your post-migration validation checklist.
Log Baseline
Crawl data tells you what exists. Log data tells you what matters. A page that Googlebot has not visited in 90 days is far less critical than a page it crawls every day.
# Extract all unique URLs that Googlebot has crawled in the last 30 days
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print $7}' | sort -u > googlebot_crawled_urls.txt
# Count Googlebot crawl frequency per URL
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print $7}' | sort | uniq -c | sort -rn > googlebot_frequency.txt
# Identify your most-crawled pages (these are your highest priority for redirects)
head -50 googlebot_frequency.txt
Indexation Snapshot
Before migration, capture your current indexation state from Google Search Console:
| Metric | Where to Find It | Why It Matters |
|---|---|---|
| Total indexed pages | GSC > Pages > Indexed | Baseline to compare post-migration |
| Top performing pages | GSC > Performance > Pages | These pages must have perfect redirects |
| Sitemaps status | GSC > Sitemaps | Verify all sitemaps are submitted and processed |
| Crawl stats | GSC > Settings > Crawl stats | Baseline crawl rate for comparison |
| Core Web Vitals | GSC > Core Web Vitals | Ensure new site does not regress on performance |
| Manual actions | GSC > Manual actions | Clear any existing issues before migration |
⚠️ Warning: Do NOT rely solely on site: operator counts. They are estimates and fluctuate wildly. Use GSC's indexation report for accurate numbers, and cross-reference with your server log data for the most complete picture.
URL Mapping and Redirect Strategy
The redirect map is the single most critical deliverable of any migration. Every old URL must map to the most relevant new URL. There are no shortcuts here -- incomplete redirect maps are the number one cause of migration traffic loss.
301 vs 302 Redirects
For migrations, always use 301 (permanent) redirects. This tells search engines the move is permanent and that link equity should transfer to the new URL.
| Redirect Type | When to Use | Equity Transfer | Migration Use |
|---|---|---|---|
| 301 Permanent | URL has permanently moved | Yes (full) | Primary choice for all migration redirects |
| 302 Temporary | URL will return to original location | No (Google holds equity at old URL) | Never use for migrations |
| 308 Permanent | Same as 301, preserves HTTP method | Yes (full) | Use for API endpoints that must preserve POST/PUT |
| Meta refresh | When server redirects are not possible | Partial | Last resort only |
Building the Redirect Map
Combine your crawl data and log data to build a comprehensive redirect map. Prioritize by traffic and crawl frequency:
# Step 1: Merge crawled URLs with Googlebot crawl frequency
# crawled_urls.csv has columns: url, status, title
# googlebot_frequency.txt has columns: count, url
awk 'NR==FNR {freq[$2]=$1; next} {print $0 "," (freq[$1] ? freq[$1] : 0)}' \
googlebot_frequency.txt crawled_urls.csv > urls_with_priority.csv
# Step 2: Sort by priority (Googlebot frequency)
sort -t',' -k4 -rn urls_with_priority.csv > urls_prioritized.csv
# Step 3: Generate redirect map template
awk -F',' '{print $1 "," "NEW_URL_HERE" "," $4}' urls_prioritized.csv > redirect_map.csv
Regex Redirects for Pattern-Based Migrations
When URL structures change systematically, regex redirects handle thousands of URLs with a few rules. Here are common patterns:
# Nginx: Redirect old product URLs to new structure
# Old: /products/widget-123.html
# New: /shop/widget-123/
location ~ ^/products/(.+)\.html$ {
return 301 /shop/$1/;
}
# Nginx: Redirect old blog date-based URLs to slug-only
# Old: /blog/2024/03/my-post-title/
# New: /blog/my-post-title/
location ~ ^/blog/\d{4}/\d{2}/(.+)$ {
return 301 /blog/$1;
}
# Nginx: Domain migration (old domain to new domain)
server {
server_name olddomain.com www.olddomain.com;
return 301 https://newdomain.com$request_uri;
}
# Apache: Equivalent regex redirects in .htaccess
RewriteEngine On
# Product URL restructure
RewriteRule ^products/(.+)\.html$ /shop/$1/ [R=301,L]
# Blog date removal
RewriteRule ^blog/\d{4}/\d{2}/(.+)$ /blog/$1 [R=301,L]
# Domain migration
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteRule ^(.*)$ https://newdomain.com/$1 [R=301,L]
🔑 Key Insight: Test every regex redirect rule against your full URL list before going live. A single misplaced capture group can redirect thousands of pages to the wrong destination. Write a script that applies each rule to your URL list and verify the output matches your redirect map.
Server Log Baseline Before Migration
In the 2-4 weeks before migration, capture detailed log baselines. These numbers become your "before" snapshot for detecting problems after launch.
What to Capture
# Daily Googlebot request volume
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print substr($4, 2, 11)}' | sort | uniq -c
# Googlebot status code distribution
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print $9}' | sort | uniq -c | sort -rn
# Googlebot crawl rate per hour (for detecting crawl rate changes)
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print substr($4, 2, 14)}' | sort | uniq -c
# Top 100 most-crawled URLs by Googlebot
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print $7}' | sort | uniq -c | sort -rn | head -100
# Current 404 rate (baseline for comparison)
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 404 {print $7}' | sort | uniq -c | sort -rn | head -50
Googlebot Crawl Patterns
Understanding Googlebot's pre-migration crawl behavior helps you set expectations for post-migration recovery:
| Pattern | What to Measure | Healthy Range |
|---|---|---|
| Daily crawl volume | Total Googlebot requests per day | Varies by site size; note your average |
| Crawl frequency per URL | How often Googlebot revisits key pages | Homepage: daily; key pages: weekly |
| 200 response ratio | % of Googlebot requests returning 200 | > 85% |
| Crawl distribution | Which sections Googlebot prioritizes | Should align with your important content |
| Response time | Average server response time to Googlebot | < 500ms for HTML pages |
💡 Pro Tip: LogBeast generates all of these baseline metrics automatically from your log files. Export the pre-migration report and keep it as your reference document. After migration, run the same analysis on new logs and compare side by side.
The Migration Day Checklist
Migration day should be boring. If you have done the preparation correctly, it is a mechanical execution of a well-rehearsed plan. Here is the step-by-step checklist:
| Step | Action | Verification | Rollback Trigger |
|---|---|---|---|
| 1 | Deploy new site to staging and run full crawl | All URLs return 200; no broken internal links | Any critical page missing or broken |
| 2 | Implement all 301 redirects on old URLs | Test 100% of redirect map with curl | Redirect coverage below 95% |
| 3 | Update DNS / deploy to production | New site is live and accessible | DNS propagation failures |
| 4 | Verify robots.txt on new site | No accidental disallow rules blocking content | Robots.txt blocks Googlebot |
| 5 | Submit updated XML sitemaps | Sitemaps reference new URLs only | Sitemaps contain old URLs |
| 6 | Verify canonical tags point to new URLs | No canonical tags pointing to old domain | Canonicals referencing old URLs |
| 7 | Update internal links to new URL structure | Crawl finds no internal links to old URLs | More than 5% broken internal links |
| 8 | Verify hreflang tags (if applicable) | All hreflang URLs resolve and are reciprocal | Broken hreflang relationships |
| 9 | Start server log monitoring | Googlebot is receiving 301s and crawling new URLs | Googlebot getting 404s or 500s |
| 10 | Add new domain to Google Search Console | Ownership verified; change of address submitted | N/A (do this regardless) |
⚠️ Warning: Never migrate on a Friday. If something goes wrong, you need business days to respond. Tuesday and Wednesday are the safest migration days because you have the rest of the week to monitor and fix issues.
Post-Migration Log Monitoring
The first 48-72 hours after migration are critical. Googlebot will begin hitting old URLs and following redirects. Your server logs will tell you immediately whether the migration is working or failing.
What to Watch: 404 Spikes
A spike in 404 responses from Googlebot means redirects are missing. This is the most common and most damaging migration failure.
# Monitor Googlebot 404s in real-time
tail -f /var/log/nginx/access.log | grep "Googlebot" | awk '$9 == 404 {print $7}'
# Count Googlebot 404s per hour (compare to baseline)
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 404 {print substr($4, 2, 14)}' | sort | uniq -c
# List the most common 404 URLs hit by Googlebot
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 404 {print $7}' | sort | uniq -c | sort -rn | head -50
What to Watch: Crawl Rate Drops
If Googlebot's crawl rate drops significantly after migration, it may indicate server performance issues, robots.txt blocks, or loss of trust.
# Compare daily Googlebot request volume (pre vs post migration)
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print substr($4, 2, 11)}' | sort | uniq -c
# Monitor response times to Googlebot (slow responses reduce crawl rate)
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print $NF}' | sort -n | \
awk '{a[NR]=$1} END {print "Median:", a[int(NR/2)], "P95:", a[int(NR*0.95)], "P99:", a[int(NR*0.99)]}'
What to Watch: Redirect Chains
Redirect chains (A -> B -> C) waste crawl budget and dilute link equity. They commonly appear during migration when old redirects stack on top of new ones.
# Find redirect chains by checking where Googlebot 301s lead
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 301 {print $7}' | sort | uniq -c | sort -rn | head -30
# Test for redirect chains using curl
while read url; do
chain=$(curl -sIL -o /dev/null -w "%{num_redirects}" "https://example.com${url}")
if [ "$chain" -gt 1 ]; then
echo "CHAIN ($chain hops): $url"
fi
done < redirect_urls.txt
🔑 Key Insight: Set up automated alerts for these three signals. A simple cron job that checks Googlebot 404 count and crawl volume every hour and sends an email alert if either deviates more than 30% from baseline can save your migration. For detailed monitoring, check our crawl budget optimization guide.
Detecting Lost Pages with Log Analysis
After migration, some pages inevitably fall through the cracks. They had redirects in the map but something went wrong in implementation, or they were missed entirely. Log analysis catches these before Google deindexes them.
Finding Orphaned URLs
Orphaned URLs are old pages that Googlebot is still trying to crawl but are returning 404s instead of 301 redirects:
# Find all unique URLs returning 404 to Googlebot post-migration
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 404 {print $7}' | sort -u > orphaned_urls.txt
# Cross-reference with your redirect map to find what was missed
comm -23 orphaned_urls.txt redirect_map_urls.txt > missing_redirects.txt
# Count how many times each orphaned URL was requested (priority indicator)
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 404 {print $7}' | sort | uniq -c | sort -rn > orphaned_priority.txt
# Find orphaned URLs that had high traffic in the pre-migration period
# (These are the most critical to fix)
while read count url; do
pre_count=$(grep "$url" pre_migration_googlebot_frequency.txt | awk '{print $1}')
if [ -n "$pre_count" ] && [ "$pre_count" -gt 10 ]; then
echo "HIGH PRIORITY: $url (pre-migration: $pre_count crawls, now: 404)"
fi
done < orphaned_priority.txt
Detecting Soft 404s
Soft 404s are pages that return a 200 status code but display an error message or empty content. These are invisible to simple status code monitoring:
# Find suspiciously small responses (potential soft 404s)
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 200 && $10 < 1000 {print $7, $10}' | sort -k2 -n | head -50
# Compare response sizes pre vs post migration for the same URLs
# If a page went from 15KB to 500 bytes, it is likely a soft 404
while read url; do
new_size=$(grep "Googlebot" /var/log/nginx/access.log | grep "$url" | awk '{print $10}' | tail -1)
old_size=$(grep "Googlebot" /var/log/nginx/access.log.old | grep "$url" | awk '{print $10}' | tail -1)
if [ -n "$old_size" ] && [ -n "$new_size" ]; then
ratio=$((new_size * 100 / old_size))
if [ "$ratio" -lt 20 ]; then
echo "SOFT 404 SUSPECT: $url (was ${old_size}B, now ${new_size}B)"
fi
fi
done < top_pages.txt
⚠️ Warning: Soft 404s are more dangerous than real 404s because they are harder to detect. Google's crawler can identify many soft 404s and will eventually deindex those pages, but the process is slower and more unpredictable than a clean 301 redirect.
Fixing Redirect Chains and Loops
Redirect chains occur when one redirect leads to another, creating a series of hops. Redirect loops occur when URL A redirects to URL B, which redirects back to URL A. Both are common during migrations, especially when old redirect rules were not cleaned up before adding new ones.
Detecting Chains and Loops
#!/bin/bash
# detect_redirect_issues.sh - Find chains and loops in your redirects
# Usage: ./detect_redirect_issues.sh urls.txt
while read url; do
result=$(curl -sIL -o /dev/null -w "%{http_code} %{num_redirects} %{url_effective}" \
--max-redirs 10 "$url" 2>/dev/null)
code=$(echo "$result" | awk '{print $1}')
hops=$(echo "$result" | awk '{print $2}')
final=$(echo "$result" | awk '{print $3}')
if [ "$hops" -gt 1 ]; then
echo "CHAIN: $url -> $final ($hops hops)"
fi
if [ "$code" -eq 0 ] || [ "$hops" -ge 10 ]; then
echo "LOOP: $url (max redirects reached)"
fi
if [ "$code" -eq 404 ]; then
echo "BROKEN: $url -> $final (ends in 404)"
fi
done < "$1"
# Run a full chain trace for a specific URL
curl -sIL "https://example.com/old-page/" 2>&1 | grep -E "^(HTTP/|Location:)"
Nginx: Fixing Redirect Chains
# BAD: This creates a chain (old -> intermediate -> final)
location /old-page/ {
return 301 /intermediate-page/;
}
location /intermediate-page/ {
return 301 /final-page/;
}
# GOOD: Point directly to the final destination
location /old-page/ {
return 301 /final-page/;
}
location /intermediate-page/ {
return 301 /final-page/;
}
# Use a map block for large-scale redirect cleanup
map $request_uri $redirect_target {
/old-page-1/ /new-page-1/;
/old-page-2/ /new-page-2/;
/products/old/ /shop/new/;
# Add all redirects here -- flat, no chains
}
server {
if ($redirect_target) {
return 301 $redirect_target;
}
}
Apache: Fixing Redirect Chains
# Ensure redirect rules are ordered correctly in .htaccess
# Process the most specific rules first
RewriteEngine On
# Direct redirects (no chains)
RewriteRule ^old-page-1/?$ /new-page-1/ [R=301,L]
RewriteRule ^old-page-2/?$ /new-page-2/ [R=301,L]
# Pattern-based redirects (catch remaining old URLs)
RewriteRule ^products/(.+)\.html$ /shop/$1/ [R=301,L]
# The [L] flag is critical -- it stops processing after the first match
# Without [L], Apache may apply multiple rules creating chains
💡 Pro Tip: After fixing redirect chains, verify the fix by re-crawling the affected URLs with CrawlBeast. Set the crawler to follow redirects and report the full chain. Any URL with more than one redirect hop still needs attention. See our redirect chains guide for more details.
Traffic Recovery Timeline
Even a perfectly executed migration will see some temporary fluctuation in organic traffic. Understanding the normal recovery timeline helps you distinguish between expected behavior and actual problems.
| Timeframe | What to Expect | Action if Not Recovering |
|---|---|---|
| Week 1 | 10-30% traffic fluctuation; Googlebot discovering redirects; crawl rate may spike as Google follows 301s | Check for 404 spikes and missing redirects in logs |
| Week 2 | Traffic stabilizing; Google starting to index new URLs; old URLs being removed from index | Verify new URLs are appearing in GSC index report |
| Week 3-4 | Traffic returning to 80-95% of pre-migration levels; most new URLs indexed | Audit pages with traffic drops; check canonical and redirect issues |
| Month 2 | Traffic at or above pre-migration levels; ranking positions stabilizing | Deep-dive into remaining underperforming pages |
| Month 3-6 | Full recovery; link equity fully transferred; rankings stable | If still down, investigate link equity loss and content parity issues |
🔑 Key Insight: The "Google dance" during weeks 1-2 is normal and expected. Do not panic and start making changes during this period unless you see clear errors in your server logs (such as mass 404s or redirect loops). Unnecessary changes during the settling period can make things worse.
Monitoring Recovery with Logs
# Track daily Googlebot crawl volume trend (should return to baseline within 2-4 weeks)
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print substr($4, 2, 11)}' | sort | uniq -c | \
awk '{print $2, $1}' > crawl_trend.tsv
# Track the ratio of 200s vs 301s vs 404s from Googlebot over time
grep "Googlebot" /var/log/nginx/access.log | \
awk '{date=substr($4, 2, 11); code=$9; counts[date][code]++}
END {for (d in counts) {
total=0; for (c in counts[d]) total+=counts[d][c];
printf "%s\t200: %d (%.0f%%)\t301: %d (%.0f%%)\t404: %d (%.0f%%)\n",
d, counts[d][200], counts[d][200]/total*100,
counts[d][301], counts[d][301]/total*100,
counts[d][404], counts[d][404]/total*100
}}' | sort
Common Migration Disasters and How to Fix Them
Even well-planned migrations can go wrong. Here are the most common disasters, how to detect them in your logs, and how to fix them fast.
Disaster 1: Robots.txt Blocking Googlebot
A new robots.txt that accidentally blocks Googlebot from critical sections. This happens more than you would think, especially when staging robots.txt rules get deployed to production.
# Detect: Googlebot stops crawling entire sections
grep "Googlebot" /var/log/nginx/access.log | \
awk '{print $7}' | awk -F'/' '{print "/" $2 "/"}' | sort | uniq -c | sort -rn
# If a previously-active section shows zero requests, check robots.txt immediately
curl -s https://example.com/robots.txt
# Fix: Update robots.txt and request re-crawl
# Also submit the updated robots.txt via GSC
Disaster 2: Mass 302s Instead of 301s
Using 302 (temporary) redirects instead of 301 (permanent) redirects. Google will not transfer link equity for 302s, and your rankings will tank.
# Detect: Check redirect status codes in logs
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 302 {print $7}' | wc -l
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 301 {print $7}' | wc -l
# If 302 count is high and 301 count is low, you have a problem
# Fix: Change all 302s to 301s in your server configuration
Disaster 3: Canonical Tags Pointing to Old Domain
New site pages have canonical tags still referencing the old domain or old URL structure. This tells Google to ignore the new pages and keep indexing the old ones (which are now redirecting).
# Detect: Crawl the new site and extract canonical tags
curl -s https://newdomain.com/ | grep -i "canonical"
# At scale, use CrawlBeast or a script
while read url; do
canonical=$(curl -sL "$url" | grep -oP '(?<=rel="canonical" href=")[^"]+')
if echo "$canonical" | grep -q "olddomain"; then
echo "BAD CANONICAL: $url -> $canonical"
fi
done < new_site_urls.txt
Disaster 4: Internal Links Still Pointing to Old URLs
The new site's navigation, footer, or content links still reference old URLs. This creates unnecessary redirect hops for both users and crawlers and wastes crawl budget.
# Detect: Look for 301s from internal page loads (not initial Googlebot discovery)
# Internal redirect chains show up as high-volume 301 URLs
grep "Googlebot" /var/log/nginx/access.log | \
awk '$9 == 301 {print $7}' | sort | uniq -c | sort -rn | head -20
# If the same old URLs are being 301'd repeatedly, internal links are the cause
# Fix: Update all templates, navigation, and content to use new URLs
Disaster 5: Sitemap Still Lists Old URLs
The XML sitemap submitted to Google still contains old URLs or includes URLs that 301 redirect. This confuses Google about which URLs are the canonical versions.
# Detect: Download and check your sitemap
curl -s https://example.com/sitemap.xml | grep -oP '(?<=)[^<]+' | head -20
# Check for old domain references in sitemap
curl -s https://example.com/sitemap.xml | grep -c "olddomain"
# Verify all sitemap URLs return 200
curl -s https://example.com/sitemap.xml | grep -oP '(?<=)[^<]+' | \
while read url; do
code=$(curl -sI -o /dev/null -w "%{http_code}" "$url")
if [ "$code" != "200" ]; then
echo "$code $url"
fi
done
⚠️ Warning: If you discover any of these disasters, fix them immediately. Every hour that Googlebot crawls broken redirects or hits 404s is an hour of lost indexation signals. Use LogBeast to set up real-time alerts so you catch these issues within minutes, not days.
Conclusion
Site migrations do not have to be terrifying. The difference between a traffic-preserving migration and a traffic-destroying one comes down to preparation, execution, and monitoring. Server logs are your best friend through all three phases.
The key takeaways from this guide:
- Build baselines before you migrate. Crawl data, log data, and indexation data give you a reference point for measuring success
- The redirect map is everything. Every old URL must map to the most relevant new URL with a 301 redirect. No exceptions
- Monitor logs obsessively post-migration. 404 spikes, crawl rate drops, and redirect chains are all visible in server logs within hours
- Expect a recovery timeline. A 2-4 week fluctuation is normal. Panic-driven changes during the settling period make things worse
- Automate your monitoring. Use tools like LogBeast to continuously analyze logs and alert you to problems before they impact rankings
Start your migration preparation today by capturing your log baselines. Run the commands in this guide against your server logs to understand Googlebot's current crawl behavior, and build your redirect map from real data rather than assumptions.
🎯 Next Steps: Read our guide on reducing 404 errors with log analysis for more on finding and fixing broken URLs, and check out crawl budget optimization to ensure Googlebot spends its crawl budget on your most important pages after migration.