robots.txt 文件详细说明

2025-03-27 50

Meta Robots 详解

360站长平台

robots.txt sitemap

robots.txt是什么

Bing 管理员工具

robots.txt怎么看

robots.txt文件

百度收录解析与操作指南

robots.txt 是一个文本文件，位于网站的根目录下，用于指导搜索引擎蜘蛛（爬虫）如何抓取网站的页面。它通过指定允许或禁止抓取的路径，控制搜索引擎对网站内容的访问。

禁止特定爬虫访问特定目录

User-agent: Googlebot
Disallow: /private/

允许特定爬虫访问特定目录，禁止其他爬虫

User-agent: Bingbot
Allow: /public/

User-agent: *
Disallow: /

指定网站地图

Sitemap: https://www.example.com/sitemap.xml

一个电商网站希望禁止所有爬虫抓取 /cart/ 和 /checkout/ 目录，但允许抓取其他所有内容。

User-agent: *
Disallow: /cart/
Disallow: /checkout/

一个新闻网站希望允许 Googlebot 和 Bingbot 抓取所有内容，但禁止其他爬虫抓取 /archives/ 目录。

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Disallow: /archives/

一个个人博客希望禁止所有爬虫抓取 /admin/ 和 /private/ 目录，并提供一个网站地图。

User-agent: *
Disallow: /admin/
Disallow: /private/

Sitemap: https://www.myblog.com/sitemap.xml

这些案例展示了如何根据不同需求配置 robots.txt 文件，以控制搜索引擎对网站内容的抓取行为。