Contents
🕷 Crawl Engine
High-speed, configurable crawl engine that handles everything from small blogs to enterprise sites.
⚡ Multi-threaded crawling
Configurable thread count (1–50 threads). Crawl 500,000+ URLs in a single session without memory issues.
🏃 Crawl speed controls
Set crawl delay per request, respect robots.txt Crawl-delay, and set custom rate limits to avoid overloading servers.
🔒 Authentication support
Crawl password-protected sites with HTTP Basic Auth, form-based login, or custom cookie injection.
🌐 Proxy support
Route crawl traffic through HTTP/HTTPS or SOCKS5 proxies. Rotate proxies per request for large-scale audits.
🏬️ URL filtering
Include/exclude URLs by pattern (regex or wildcard). Filter by path, query parameters, file extension, or depth.
🤖 Custom User-Agent
Set any User-Agent string. Presets for Googlebot, Bingbot, mobile crawlers, or fully custom strings.
📄 robots.txt compliance
Fully respects robots.txt directives. Option to ignore robots.txt for internal audits with override mode.
↩️ Crawl resumption
Pause and resume any crawl. Crawl state saved to disk — continue exactly where you left off after a restart.
🔗 Link Analysis
Comprehensive internal and external link auditing to find broken links and optimize link equity.
🔴 Broken link detection
Identifies all 4xx and 5xx response codes for both internal and external links crawled from your site.
↔️ Redirect chain analysis
Visualizes full redirect chains from source URL to final destination. Identifies loops, double-hops, and HTTP→HTTPS issues.
🌐 External link audit
All outbound links checked for status. Group by domain, filter nofollow/sponsored/ugc attributes, identify risky domains.
🄰 Anchor text analysis
Distribution of anchor text used across all inbound links. Identify over-optimized exact-match anchors and missing keyword links.
📋 Orphan page detection
Pages on your sitemap or in GSC that have zero inbound internal links. Critical for crawlability and PageRank flow.
📈 Link depth report
Number of clicks from homepage to each page. Pages deeper than 3–4 clicks are harder for search engines to crawl.
📝 On-Page SEO
Every on-page SEO element audited across your entire site at once.
📖 Title tag audit
Check every page title for length (pixel and character), duplication, and missing values.
- Missing title tags
- Duplicate title tags
- Title over 60 chars / 600px
- Title under 30 chars (too short)
💬 Meta description audit
Analyze all meta descriptions for length, uniqueness, and presence.
- Missing meta descriptions
- Duplicate descriptions
- Over 160 chars truncated
- Under 70 chars too short
🆊 Heading structure
Full H1–H6 hierarchy analysis for every page.
- Missing H1
- Multiple H1s
- Skipped heading levels
- Empty headings
🔗 Canonical tag audit
Verify canonical tags are correctly implemented and not creating conflicts with other directives.
- Missing canonicals
- Self-referencing canonicals
- Canonical to redirect
- Canonical / noindex conflict
🤖 Robots meta analysis
Check noindex, nofollow, noarchive, and other robots directives. Alert on pages accidentally blocked.
🌎 hreflang audit
Validate hreflang tag implementation for multilingual sites. Detect missing return tags, wrong language codes, and broken links.
📄 Content Analysis
Find duplicate content, thin pages, and content issues that hurt SEO performance.
🔐 Duplicate content detection
Near-duplicate detection using content hashing. Identifies pages with same or similar body content.
📋 Thin content pages
Pages with low word count flagged as thin content. Set custom minimum word count threshold.
📷 Image audit
All images checked for missing alt text, oversized files, broken URLs, and next-gen format opportunities.
- Missing alt attributes
- Alt text too long (>125 chars)
- Images >100KB threshold
- No WebP / AVIF equivalent
🔢 Open Graph and Twitter Card
Validate og:title, og:description, og:image, and Twitter card tags on every page. Find missing or broken social preview tags.
⚡ Performance Metrics
Identify slow pages and server-side bottlenecks during crawl.
⏰️ Response time tracking
TTFB and total load time recorded per URL. Sort by slowest pages. Set threshold alerts above 1s / 2s / 3s.
📊 Page size analysis
HTML, JS, CSS, image weights per page. Identify pages over 1MB threshold that hurt Core Web Vitals.
🔇 Compression detection
Checks whether gzip or Brotli compression is enabled. Flags uncompressed responses above 10KB.
🔒 HTTPS audit
All HTTP resources flagged as mixed content. All HTTP-only pages flagged for HTTPS migration.
🤖 JavaScript Rendering
Built-in Chromium engine renders JS before analysis — essential for React, Vue, and Angular sites.
🛠️ Chromium headless rendering
Full Chromium headless browser built-in. Renders every page as Googlebot sees it. No external setup required.
🔭 Rendered vs raw diff
Compare raw HTML (without JS) against fully rendered DOM. Spot content only visible after JS execution — or hidden from crawlers.
⌛ Custom wait conditions
Wait for specific DOM elements, network idle, or custom timeouts before capturing page content.
📌 Lazy load support
Scrolls pages to trigger lazy-loaded content. Images and links loaded only on scroll are crawled and audited.
🏷️ Structured Data and Schema
Validate Schema.org markup and identify rich result opportunities across your site.
📋 Schema.org detection
Finds and validates all JSON-LD, Microdata, and RDFa markup. Highlights missing required fields per schema type.
🔍 Validation errors
Flags required properties missing from schema types. Integrated with Google’s Rich Results eligibility rules.
📊 Reports and Export
Share results with clients and teams in any format they prefer.
📊 CSV export
Export any view to CSV. Custom column selection. Export all issues, or filter by type / severity.
📄 Excel export
Multi-sheet Excel reports with separate tabs per issue category. Formatted with conditional formatting out of the box.
📋 PDF audit report
White-label branded PDF reports. Include executive summary, issue breakdown, and top fixes per category. Add your logo.
🗺️ XML sitemap export
Generate sitemap.xml from crawl results. Configure priority, changefreq, lastmod. Validate against Google guidelines.
🔗 Google Sheets export
Push crawl results directly to Google Sheets. Re-crawl and refresh automatically. Share with clients via link.
🔫 JSON / REST API
Full crawl results available via JSON. REST API triggers crawls and retrieves results. Webhook on crawl complete.
🔄 Scheduling and Automation
Set-and-forget monitoring for always-on SEO auditing.
📆 Scheduled crawls
Run crawls daily, weekly, or monthly automatically. Alerts sent when new issues are detected.
🔔 Change detection
Compare crawl results over time. See when new 404s appeared, titles changed, or redirects were added.
🤖 Headless / CLI mode
Run CrawlBeast from command line. Integrate into CI/CD pipelines. Export results as JSON or CSV on completion.
📡 Webhook alerts
POST to any URL when crawl completes or when critical issues are found. Integrates with Slack, PagerDuty, Zapier, n8n.
🔫 Integrations
Connect CrawlBeast to the tools already in your SEO stack.
🌎 Google Search Console
Import impressions, clicks, and coverage data from GSC. Cross-reference with crawl results to prioritize fixes by traffic impact.
📊 Google Analytics
Overlay GA4 traffic data on crawl results. Filter issues by high-traffic pages first.
🔗 Ahrefs / Semrush import
Import backlink data to enrich internal link analysis. See which crawled pages have the most external backlinks.
⚙️ Zapier / n8n / Make
Trigger workflows on crawl events. Send issue reports to project management tools (Jira, Trello, Asana, Linear).