我正在使用以下IIS重写规则来阻止尽可能多的机器人.
<rule name="BotBlock" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="^$|bot|crawl|spider" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
</rule>
此规则阻止所有具有空User—Agent字符串或包含bot
、crawl
和spider
的User—Agent字符串的请求.这工作很好,但它也块googlebot
,我不想要.
那么,我如何从上面的模式中排除googlebot
字符串,使它击中网站.
我试过
^$|!googlebot|bot|crawl|spider
^$|(?!googlebot)|bot|crawl|spider
^(?!googlebot)$|bot|crawl|spider
^$|(!googlebot)|bot|crawl|spider
但他们要么阻止所有的用户代理,要么仍然不允许googlebot.谁有一个解决方案,并了解一些regex?
So thanks to The fourth bird the solution becomes:
<add input="{HTTP_USER_AGENT}" pattern="^$|\b(?!.*googlebot.*\b)\w*(?:bot|crawl|spider)\w*" />