CrawlBeast Features | Complete SEO Crawler Feature List

🕷 Crawl Engine

High-speed, configurable crawl engine that handles everything from small blogs to enterprise sites.

⚡ Multi-threaded crawling

Configurable thread count (1–50 threads). Crawl 500,000+ URLs in a single session without memory issues.

🏃 Crawl speed controls

Set crawl delay per request, respect robots.txt Crawl-delay, and set custom rate limits to avoid overloading servers.

🔒 Authentication support

Crawl password-protected sites with HTTP Basic Auth, form-based login, or custom cookie injection.

🌐 Proxy support

Route crawl traffic through HTTP/HTTPS or SOCKS5 proxies. Rotate proxies per request for large-scale audits.

🏬️ URL filtering

Include/exclude URLs by pattern (regex or wildcard). Filter by path, query parameters, file extension, or depth.

🤖 Custom User-Agent

Set any User-Agent string. Presets for Googlebot, Bingbot, mobile crawlers, or fully custom strings.

📄 robots.txt compliance

Fully respects robots.txt directives. Option to ignore robots.txt for internal audits with override mode.

↩️ Crawl resumption

Pause and resume any crawl. Crawl state saved to disk — continue exactly where you left off after a restart.

🔗 Link Analysis

Comprehensive internal and external link auditing to find broken links and optimize link equity.

🔴 Broken link detection

Identifies all 4xx and 5xx response codes for both internal and external links crawled from your site.

404 Not Found 410 Gone 500 Server Error 503 Unavailable

↔️ Redirect chain analysis

Visualizes full redirect chains from source URL to final destination. Identifies loops, double-hops, and HTTP→HTTPS issues.

301 Permanent 302 Temporary 307 / 308

🌐 External link audit

All outbound links checked for status. Group by domain, filter nofollow/sponsored/ugc attributes, identify risky domains.

🄰 Anchor text analysis

Distribution of anchor text used across all inbound links. Identify over-optimized exact-match anchors and missing keyword links.

📋 Orphan page detection

Pages on your sitemap or in GSC that have zero inbound internal links. Critical for crawlability and PageRank flow.

📈 Link depth report

Number of clicks from homepage to each page. Pages deeper than 3–4 clicks are harder for search engines to crawl.

📝 On-Page SEO

Every on-page SEO element audited across your entire site at once.

📖 Title tag audit

Check every page title for length (pixel and character), duplication, and missing values.

Missing title tags
Duplicate title tags
Title over 60 chars / 600px
Title under 30 chars (too short)

💬 Meta description audit

Analyze all meta descriptions for length, uniqueness, and presence.

Missing meta descriptions
Duplicate descriptions
Over 160 chars truncated
Under 70 chars too short

🆊 Heading structure

Full H1–H6 hierarchy analysis for every page.

Missing H1
Multiple H1s
Skipped heading levels
Empty headings

🔗 Canonical tag audit

Verify canonical tags are correctly implemented and not creating conflicts with other directives.

Missing canonicals
Self-referencing canonicals
Canonical to redirect
Canonical / noindex conflict

🤖 Robots meta analysis

Check noindex, nofollow, noarchive, and other robots directives. Alert on pages accidentally blocked.

🌎 hreflang audit

Validate hreflang tag implementation for multilingual sites. Detect missing return tags, wrong language codes, and broken links.

📄 Content Analysis

Find duplicate content, thin pages, and content issues that hurt SEO performance.

🔐 Duplicate content detection

Near-duplicate detection using content hashing. Identifies pages with same or similar body content.

Exact duplicate (100%) Near-duplicate (85%+)

📋 Thin content pages

Pages with low word count flagged as thin content. Set custom minimum word count threshold.

📷 Image audit

All images checked for missing alt text, oversized files, broken URLs, and next-gen format opportunities.

Missing alt attributes
Alt text too long (>125 chars)
Images >100KB threshold
No WebP / AVIF equivalent

🔢 Open Graph and Twitter Card

Validate og:title, og:description, og:image, and Twitter card tags on every page. Find missing or broken social preview tags.

⚡ Performance Metrics

Identify slow pages and server-side bottlenecks during crawl.

⏰️ Response time tracking

TTFB and total load time recorded per URL. Sort by slowest pages. Set threshold alerts above 1s / 2s / 3s.

📊 Page size analysis

HTML, JS, CSS, image weights per page. Identify pages over 1MB threshold that hurt Core Web Vitals.

🔇 Compression detection

Checks whether gzip or Brotli compression is enabled. Flags uncompressed responses above 10KB.

🔒 HTTPS audit

All HTTP resources flagged as mixed content. All HTTP-only pages flagged for HTTPS migration.

🤖 JavaScript Rendering

Built-in Chromium engine renders JS before analysis — essential for React, Vue, and Angular sites.

🛠️ Chromium headless rendering

Full Chromium headless browser built-in. Renders every page as Googlebot sees it. No external setup required.

🔭 Rendered vs raw diff

Compare raw HTML (without JS) against fully rendered DOM. Spot content only visible after JS execution — or hidden from crawlers.

⌛ Custom wait conditions

Wait for specific DOM elements, network idle, or custom timeouts before capturing page content.

📌 Lazy load support

Scrolls pages to trigger lazy-loaded content. Images and links loaded only on scroll are crawled and audited.

🏷️ Structured Data and Schema

Validate Schema.org markup and identify rich result opportunities across your site.

📋 Schema.org detection

Finds and validates all JSON-LD, Microdata, and RDFa markup. Highlights missing required fields per schema type.

Article Product FAQ BreadcrumbList Organization LocalBusiness Review Event

🔍 Validation errors

Flags required properties missing from schema types. Integrated with Google’s Rich Results eligibility rules.

📊 Reports and Export

Share results with clients and teams in any format they prefer.

📊 CSV export

Export any view to CSV. Custom column selection. Export all issues, or filter by type / severity.

📄 Excel export

Multi-sheet Excel reports with separate tabs per issue category. Formatted with conditional formatting out of the box.

📋 PDF audit report

White-label branded PDF reports. Include executive summary, issue breakdown, and top fixes per category. Add your logo.

🗺️ XML sitemap export

Generate sitemap.xml from crawl results. Configure priority, changefreq, lastmod. Validate against Google guidelines.

🔗 Google Sheets export

Push crawl results directly to Google Sheets. Re-crawl and refresh automatically. Share with clients via link.

🔫 JSON / REST API

Full crawl results available via JSON. REST API triggers crawls and retrieves results. Webhook on crawl complete.

🔄 Scheduling and Automation

Set-and-forget monitoring for always-on SEO auditing.

📆 Scheduled crawls

Run crawls daily, weekly, or monthly automatically. Alerts sent when new issues are detected.

🔔 Change detection

Compare crawl results over time. See when new 404s appeared, titles changed, or redirects were added.

🤖 Headless / CLI mode

Run CrawlBeast from command line. Integrate into CI/CD pipelines. Export results as JSON or CSV on completion.

📡 Webhook alerts

POST to any URL when crawl completes or when critical issues are found. Integrates with Slack, PagerDuty, Zapier, n8n.

🔫 Integrations

Connect CrawlBeast to the tools already in your SEO stack.

🌎 Google Search Console

Import impressions, clicks, and coverage data from GSC. Cross-reference with crawl results to prioritize fixes by traffic impact.

📊 Google Analytics

Overlay GA4 traffic data on crawl results. Filter issues by high-traffic pages first.

🔗 Ahrefs / Semrush import

Import backlink data to enrich internal link analysis. See which crawled pages have the most external backlinks.

⚙️ Zapier / n8n / Make

Trigger workflows on crawl events. Send issue reports to project management tools (Jira, Trello, Asana, Linear).

CrawlBeast Complete Feature List

Contents