85 terms covering server logs, SEO crawling, AI bots, technical SEO, web security, and analytics.
A 301 redirect is a permanent HTTP redirect that tells search engines a page has permanently moved to a new URL, transfe...
A 302 redirect is a temporary HTTP redirect that tells search engines a page has temporarily moved, keeping the original...
A 404 error is an HTTP status code indicating that the requested page was not found on the server, which can harm SEO if...
An access log is a server-generated file that records every HTTP request made to a web server, including the client IP, ...
An AI crawler is a web bot operated by an artificial intelligence company that systematically downloads web content to b...
AI Overview (formerly SGE) is Google's AI-generated summary that appears at the top of search results, synthesizing info...
AI training data is the collection of text, images, and other content scraped from the web that AI companies use to trai...
Anchor text is the visible, clickable text of a hyperlink that provides context to search engines about the linked page'...
Bandwidth analysis is the process of examining server log data to understand how much data your server transfers, which ...
Bingbot is Microsoft's web crawler that discovers and indexes web pages for Bing Search, the second-largest search engin...
Bot detection is the process of identifying automated web traffic (bots, crawlers, scrapers) and distinguishing it from ...
Bounce rate is the percentage of visitors who leave a website after viewing only one page, often used as an indicator of...
A broken link is a hyperlink that points to a page or resource that no longer exists, returning a 404 error and creating...
A canonical URL is the preferred version of a web page specified via the rel="canonical" link tag, telling search engine...
A CDN (Content Delivery Network) is a distributed network of servers that caches and delivers web content from locations...
Click-through rate (CTR) is the percentage of users who click on your search result after seeing it, calculated by divid...
The Combined Log Format is an extension of the Common Log Format that adds the referrer URL and user-agent string to eac...
The Common Log Format (CLF) is a standardized text format for web server access logs that records the client IP, timesta...
Content pruning is the strategic process of removing, consolidating, or improving low-quality, outdated, or underperform...
Core Web Vitals are a set of three Google metrics β LCP, CLS, and INP β that measure real-world user experience for load...
Crawl budget is the number of pages a search engine crawler will fetch from your site within a given time period, determ...
Crawl depth is the number of clicks or links a crawler must follow from the homepage to reach a specific page, with deep...
Crawl rate is the speed at which a search engine crawler requests pages from your server, typically measured in requests...
A crawl trap is a URL structure that causes crawlers to get stuck in an infinite or near-infinite loop of pages, wasting...
Crawl-delay is a robots.txt directive that tells crawlers to wait a specified number of seconds between consecutive requ...
Crawler management is the practice of controlling which bots can access your website, how fast they crawl, and which con...
Cumulative Layout Shift (CLS) is a Core Web Vital that measures the total amount of unexpected visual movement of page c...
A DDoS (Distributed Denial of Service) attack is a malicious attempt to overwhelm a server with traffic from multiple so...
Deindexation is the removal of a previously indexed page from a search engine's index, either intentionally (via noindex...
Duplicate content occurs when identical or substantially similar content appears at multiple URLs, causing search engine...
Dynamic rendering serves pre-rendered static HTML to search engine crawlers while serving the normal JavaScript-powered ...
Edge computing processes data and runs application logic at network edge locations close to users, reducing latency and ...
An error log records server-side errors, warnings, and diagnostic messages generated by the web server or application wh...
Faceted navigation is a filtering system that lets users refine listings by multiple attributes, often creating crawlabl...
A featured snippet is a special search result box that appears at the top of Google's results (position zero), displayin...
Googlebot is Google's web crawler that discovers and indexes web pages for Google Search, making it the most important b...
GPTBot is OpenAI's official web crawler that collects content from websites to train and improve GPT models, identifiabl...
A honeypot trap is a hidden link or page invisible to human users but discoverable by bots, used to identify and block a...
Hreflang is an HTML attribute that tells search engines which language and regional version of a page to serve to users ...
HTTP status codes are three-digit response codes returned by a web server to indicate the result of an HTTP request, suc...
HTTP/2 is a major revision of the HTTP protocol that improves web performance through multiplexing, header compression, ...
HTTP/3 is the latest version of the HTTP protocol, built on QUIC instead of TCP, offering faster connection establishmen...
Indexation is the process by which search engines add web pages to their searchable database (index) after crawling and ...
Interaction to Next Paint (INP) is a Core Web Vital that measures the latency of all user interactions (clicks, taps, ke...
Internal linking is the practice of creating hyperlinks between pages on the same website, distributing link equity, est...
IP geolocation is the process of determining the geographic location (country, city, region) of a client based on its IP...
JavaScript rendering in SEO refers to search engine crawlers executing JavaScript to access dynamically generated conten...
A knowledge panel is an information box that appears on the right side of Google search results, displaying key facts ab...
Largest Contentful Paint (LCP) is a Core Web Vital that measures how long it takes for the largest visible content eleme...
Link equity (link juice) is the SEO value passed from one page to another through hyperlinks, influencing the linked pag...
LLM citation refers to the practice of large language models attributing and linking to source websites when generating ...
Log aggregation is the process of collecting, centralizing, and combining log data from multiple servers, services, or s...
Log parsing is the process of extracting structured data fields from raw log file lines, converting unstructured text in...
Log rotation is the automated process of archiving, compressing, and eventually deleting old log files to prevent them f...
Meta robots is an HTML meta tag that provides page-level instructions to search engine crawlers about indexing, followin...
Nofollow is a link attribute or meta directive that tells search engines not to pass link equity (PageRank) through a li...
Noindex is a meta robots directive that instructs search engines not to include a specific page in their search index, p...
Organic traffic is the visitors who arrive at your website through unpaid search engine results, representing the primar...
An orphan page is a page on your website that has no internal links pointing to it, making it effectively invisible to s...
Page speed is the measure of how quickly a web page loads and becomes interactive, encompassing server response time, re...
Pagination SEO refers to the optimization of multi-page content sequences to ensure search engines crawl and index pagin...
Rate limiting is a server-side technique that restricts the number of requests a client can make within a given time per...
Real-time log monitoring is the continuous observation of server log data as it is generated, enabling immediate detecti...
A redirect chain occurs when a URL redirects to another URL that also redirects, creating a series of multiple hops that...
Referrer analysis examines the HTTP Referer header in server logs to understand where your traffic originates, which ext...
Render budget is the limited resources search engines allocate to executing JavaScript and rendering pages, which determ...
A reverse proxy is a server that sits in front of web servers, forwarding client requests to the appropriate backend ser...
Rich snippets are enhanced search result listings that display additional information (star ratings, prices, images, FAQ...
Robots.txt is a text file placed at the root of a website that tells web crawlers which URLs they are allowed or disallo...
Schema markup is the specific code implementation of Schema.org vocabulary on web pages, providing search engines with s...
Google Search Console is a free tool that helps website owners monitor, maintain, and troubleshoot their site's presence...
Server response time (Time to First Byte) is the duration between when a server receives an HTTP request and when it sen...
Session duration is the total time a user spends on your website during a single visit, measured from the first page vie...
Site migration is the process of making significant changes to a website's structure, domain, platform, or design that c...
A soft 404 is a page that displays a 'not found' message to users but returns an HTTP 200 (OK) status code instead of a ...
SSL/TLS (Secure Sockets Layer/Transport Layer Security) is the encryption protocol that enables HTTPS, securing data tra...
Structured data is a standardized format (typically JSON-LD using Schema.org vocabulary) for providing explicit informat...
Thin content refers to web pages with little or no valuable, original content that fail to satisfy user intent, potentia...
URL parameters are key-value pairs appended to a URL after a question mark that can cause duplicate content and crawl bu...
User-agent spoofing is the practice of a bot or client sending a fake user-agent string to disguise its identity, often ...
A user-agent string is an HTTP header value that identifies the client making a request, including the browser name, ver...
The W3C Extended Log File Format is a customizable, header-defined log format used primarily by IIS and some CDNs, where...
A WAF (Web Application Firewall) is a security system that filters, monitors, and blocks malicious HTTP traffic between ...
A web crawler (spider or bot) is an automated program that systematically browses the web by following links, downloadin...
An XML sitemap is a structured file that lists all important URLs on a website, helping search engine crawlers discover ...