January 27th, 2009
How To Detect A Search Engine Spider/Crawler With PHP
I was tasked with the job of writing a small PHP script today that detects whether a search engine spider is crawling a page of your site. There are a few ways to go about it. The challenging thing about the script is that there are so many spiders on the web. The script I am currently using only checks for the main spiders. It does however allow you to add as many crawlers as you want. Here is the code with comment explanations:
if ( ! function_exists('check_if_spider'))
{
function check_if_spider()
{
// Add as many spiders you want in this array
$spiders = array(
'Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'Slurp', 'msnbot',
'ia_archiver', 'Lycos', 'Scooter', 'AltaVista', 'Teoma', 'Gigabot',
'Googlebot-Mobile'
);
// Loop through each spider and check if it appears in
// the User Agent
foreach ($spiders as $spider)
{
if (eregi($spider, $_SERVER['HTTP_USER_AGENT']))
{
return TRUE;
}
}
return FALSE;
}
}
And there we have it. You can find a list of search engine crawlers here.
Now the other way of doing this check without having to specify the spiders in an array is using the get_browser() PHP function. It returns an array with very useful data. One of the things it returns is [crawler] TRUE/FALSE. I didn’t use it because I have not done enough testing to see how efficient it is. You can read more about this function on the PHP.net website.

















Neat, just what I needed for my site, google and other are crawling all over it, messing up my visitor stats.
I am glad you found it useful
what if the bot name changed? and how we test it?
Just a note:
eregi() is depreciated and will not be supported by php in the future. Use preg_match() going forward. It requires forward slashes sandwiching your pattern like so:’/pattern/’
That is correct Gatorpower, I have replaced eregi() in all my own scripts as from php 5.3 onwards, it is no longer supported and will even be removed in php 6.0.