As you my already aware that Search Engine Optimization (SEO) is never ending working for blogger. There are always something that we can – and should – optimize so our blog have more exposure in search engine. One of important part for SEO is to tell search engine what to do with our blog content when crawl our blog. Yeah – surely search engine still know what to do with our blog even we do not tell how but of course, The result will be not optimum. We can optimizing the result by tell search engine what exactly to do with our blog. Which one it is allowed, and which one is not. Having robots.txt file in our blog is the solutions for it.
How Does Robots.txt Work?
We can tell search engine clearly how to treat our blog content by using robots.txt file. Which content search engine should be indexed and which content that should be not indexed. In short word, robots.txt tell search engine robot what allowed and what not allowed. Robots.txt command quite simple. There are only three comment that we use to optimize it.
Here three specific comment that we use in robots.txt:
User-agent: the robot the following rule applies to
Disallow: the URL that want to block
Allow: the URL that want to alllow
Here the example how we can use above command in Robot.txt for better understanding:
# disable google image bot
User-agent: Googlebot-Image
Disallow: /# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
As you can see that the first example tell google-image boot is disallow to crawl the site. And in the second example, robots.txt tell Google AdSense bot to crawl entire site.
As per example above, we should already aware how robots.txt work. And please note robots.txt is only needed if we want to restrict some of our content from search engine. If we want allow all, we surely no need a Robots.txt file.
Optimize Robots.txt for WordPress Blog
In my opinion, The most important thing that why we should consider to use robots.txt is to avoid duplicate content. Duplicate content is really bad for SEO blog, it even can make a blog penalized by search engine. That’s why we need to optimize content of our blog robots.txt file so there are no room for duplicate content. Another thing that usingof robots.txt is for security reasons. We can hide some blog directories from searching by disallow search engine to index it.
Here robots.txt file I use in this blogging update blog :
#Disallow some path for all search engine boot
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/upgrade
Disallow: /wp-content/backup-*
Disallow: /wp-content/themes
Disallow: /images/
Disallow: /feed
Disallow: /aff/
Disallow: /category/*/*
Disallow: */feed
Disallow: /search/*/*
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads# Allow Google Image in entire site
User-agent: Googlebot-Image
Disallow:
Allow: /*# Allow Google AdSense in entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*# Disable Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /# Disable digg mirror
User-agent: duggmirror
Disallow: /Sitemap: http://letupdate.com/sitemap.xml.gz
I use above base on my needs to use robots.txt to avoid duplicate content, try to avoid google hack, and also try to help search engine know better about this blog. It may be needed some changing in future for better result.And of course, you can customize it freely base on your own needs.
Conclusions
As a blogger, we should aware of robots.txt and what we can do about it in order to optimize our blog for Search Engine (SEO). Duplicate content is one of many thing that must be avoided regarding to SEO and that exactly what we do by optimize the robots.txt file. There are may any other ways to avoid duplicate content, but I specially suggest to use robots.txt file for it. Robots.txt also useful for security purpose. Yeah – we can use robot.txt to hide some of our blog directories from searching by disallow search engine to index it. And Robots.txt also useful to make search engine index our blog base on our needs and can use to avoid google hack by hide some file from search engine (google) eyes.
NOTE: I choose to optimize my blog SEO by using robots.txt file even tough I already use “All in One SEO Pack” WordPress plugin which already have tools (virtual robots.txt)to restrict (no-index) category, archive, and tag archive to avoid duplicate content. It is because I think it is best to hard code the instruction instead use virtual instruction like in “All in One SEO Pack” WordPress Plugins. And also robots.txt file give more flexibility in restriction.
That’s all update this time, wait for next update!


today I read this post and i check Google robot.txt ,there added many parameter, what about site map?
“it is good to learn from experienced people, also like you.
Thanks for sharing this useful information!
“
I totally agree with SEO GURU, If we do not want to restrict any folder or file of our website then no need to use robots txt file.
Dana;
Thanks for this article, Google Webmaster is giving me fits trying to correct crawl errors, most of which are the result of making links out of internal site tags and categories. This should help to get rid of some of the worst offenders.
Thanks, Michael Brown
Yeah, we can surely play with robots.txt if we find that it is too hard to fix the crawls errors directly — we just tell Google to not crawl the bad part.
Great article that clarifies some things…
I’ve got one question: I get duplicate content from my comments pages such as “……./?replytocom=2″. Can this be restricted by robots.txt?
Yes, it can be restricted by robots.txt (forgot the detail though). However, you can restrict it easily by using AIO or Platinum SEO.
Very important to use the robots.txt file. Great post.
Thanks mate.
In general, we prefer that our web pages are indexed by the search engines. But there may be some content that we don’t want to be crawled & indexed. Like the personal images folder, website administration folder, customer’s test folder of a web developer, no search value folders like cgi-bin, and many more. The main idea is we don’t want them to be indexed.
Yes, you are right mate.
using robot.txt file is essential in order to avoid the duplicate contnet issue.
Yes, it is.
Don’t forget that you need to set your canonical links, otherwise your content is still accessible, like
mysite.com/my-post/0
mysite.com/my-post/1
mysite.com/my-post/2
etc. …
mysite.com/my-post/100000000000
All it takes is one enemy to generate a link farm pointing to a few thousand of these addresses, and your domain will never get good serps again, and will be permanently marked as spam.
Thanks for suggestion.
As usual… you always make post for basic things, but the good thing is.. they hit.. good thing there’s someone bothering to do this..
thanks Dana!
-Nhoel of keywordspeak.com
It is because i my self is in basic state of blogging.
Nice explanation about robot.txt, it always looked technical to me so i never dared mess with it
will try to set it up now with the help of your post,
You should be not afraid of it because you always can delete it if something wrong.
Yay! This is a hardwork. I learned about this on Google Webmaster Tools but until now I haven’t done it yet. You remind me. Thanks.
You’re welcome marlene. I think knowing robots.txt is important for each blogger.
Nice post bro… its really useful to optimize your robots.txt! it is really useful when it comes to security!
Thanks. Robots.txt surely worthy to optimize for sake of SEO and security.
I think u need to update the robots.txt for ur blog.. as u r using letupdate.com/title-name as the permalink there is no necessity of restricting “/category/*/*” what do u say??
But i still have category page. Hem… or may it can handle by /category/ instead. How do you think?
Use /category/*/*
That way you index your category names (good seo) but don’t duplicate the posts they point to.
Whether using an SEO plugin or not, it’s still good to know what’s happening with the robots.txt file. Good info here. Thanks!
Yeah, i agree that robots.txt is worthy knowledge for us as web master.