robots.txt

A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. The robots.txt file is a web standard file that most good bots consume before requesting anything from a specific domain.

Web Sraper frameworks like scrapy respect robots.txt by default, but you may still change the setting to ignore robots.txt.