
Overview
- Find URLs: Search for URLs from websites orXML sitemaps.
- Add to Crawl Queue: Queue task to be executed.
- HTTP Request: The crawler makes an HTTP request to get the headers and acts according to the returned status code
200 - it crawls and parses the HTML.
30X - it follows the redirects.
40X - it will note the error and not load the HTML
50X - it may come back later to check if the status code has changed.
- Render Queue: Render queue costs more resources, so your site might not be rendered.
- Ready to be indexed: If all criteria are met, page will be eligible to be indexed and shown in search results.
Readings
Reference