Using robots.txt to your advantage to reduce server load

A little while ago, my hoster (1and1.com) whom I actually like quite a bit, decided that I was using too much of my fair share of the common server and decided to warn me that I was “abusing” the system. They told me that if I didn’t reduce my server load, I would have to move up to a (rather costly) dedicated server.

After an initial panic attack, I finally took a long overdue look at caching and search engine crawlers. Here’s what I found in that process:

  • Using a good caching plugin (such as W3 Total Cache¬†for WordPress, which I already had been using) is important for page load time and some amount of http requests. It can reduce server load by adding browser cache rules and by accumulating and minimizing server requests (as for CSS stylesheets and JavaScript files). While this approach was important for site responsiveness, I don’t think it did much to help my server load problem.
  • As it turns out, it doesn’t make sense to let search engine crawlers index content that either shouldn’t be indexed or that does not add much to search engine placement. Large-size content (media files or large PDFs) as well as possibly images are good examples. Preventing indexing of these can easily be accomplished by using a well-crafted robots.txt file (the file that tells search engines how to handle your site’s content). I found a good write-up of this approach on Perishable Press.
  • It’s important to not just keep an eye on Google Analytics stats but also on the raw http (get/post) stats – especially when the hosting company frowns on heavy use in a shared server environment.

I tried out the following robots.txt file on my server (that actually hosts a few other sites, too) and saw the drop in server requests that is shown in the image on the top of this post. Pretty impressive, isn’t it? And as you can see, the improvement happened¬†overnight!

User-agent: *

Disallow: /feed/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Disallow: /wp-
Disallow: /otherprivatefolder/

Allow: /downloads/