How To Detect A Search Engine Spider/Crawler With PHP

I was tasked with the job of writing a small PHP script today that detects whether a search engine spider is crawling a page of your site. There are a few ways to go about it. The challenging thing about the script is that there are so many spiders on the web. The script I am currently using only checks for the main spiders. It does however allow you to add as many crawlers as you want. Here is the code with comment explanations:

if ( ! function_exists('check_if_spider'))
{
	function check_if_spider()
	{
		// Add as many spiders you want in this array
		$spiders	= array(
						'Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'Slurp', 'msnbot',
						'ia_archiver', 'Lycos', 'Scooter', 'AltaVista', 'Teoma', 'Gigabot',
						'Googlebot-Mobile'
					);

		// Loop through each spider and check if it appears in
		// the User Agent
		foreach ($spiders as $spider)
		{
			if (eregi($spider, $_SERVER['HTTP_USER_AGENT']))
			{
				return TRUE;
			}
		}
		return FALSE;
	}
}

And there we have it. You can find a list of search engine crawlers here.

Now the other way of doing this check without having to specify the spiders in an array is using the get_browser() PHP function. It returns an array with very useful data. One of the things it returns is [crawler] TRUE/FALSE. I didn’t use it because I have not done enough testing to see how efficient it is. You can read more about this function on the PHP.net website.

5 Comments

  • Raeven, February 4, 2009:

    Neat, just what I needed for my site, google and other are crawling all over it, messing up my visitor stats.

  • Matt, February 4, 2009:

    I am glad you found it useful :)

  • malaysia web designer, February 19, 2009:

    what if the bot name changed? and how we test it?

  • gatorpower, November 19, 2009:

    Just a note:

    eregi() is depreciated and will not be supported by php in the future. Use preg_match() going forward. It requires forward slashes sandwiching your pattern like so:’/pattern/’

  • Matt, November 24, 2009:

    That is correct Gatorpower, I have replaced eregi() in all my own scripts as from php 5.3 onwards, it is no longer supported and will even be removed in php 6.0.

Leave A Comment