Ever wondered how a spammer got your email address? Read on....
Web bots and spam - we know what spam is but what about web bots? And are they related? YES!
What are web bots?
Basically these are automated programs that scour the Internet gathering information, links and email addresses. Major search engines have been employing the use of these for over a decade. They will spider your site having been provided your URL from an add URL page such as: www.google.co.uk/addurl.html or a page on another site that links to yours and the spider followed the link.
Search engine web bots are generally well behaved and will gather and index your web site and use the information in a useful manner to provide links into your site when someone searches their index. Other web bots that are now being used by spammers to collect just email addresses aren't so useful. They can collect millions of email addresses from web pages in a relatively short time. Once they have your email address you are on their spam lists.
Below we show you how to avoid spam bots but ensure you don't frighten away the friendly spiders.
- Protecting your email addresses from spam web bots
This is relatively simple if you are used to using JavaScript, or editing raw HTML.
JavaScript to hide your email address from web bots:
Within the script above, change the following variables to your own details:
DisplayName
MailboxName
DomainName
DomainExtensionThen copy the whole of the script above and paste into the pages where you want the email address link to appear. And that's it - as the email address is made-up from multiple parts the predatory spam web bots will not recognise this as a valid email address ;-)
HELP to ensure your site is search engine web bot friendly:
If you want the web bots to spider all your site ensure you have a text file called robots.txt in the root directory containing:
User-agent: *
Disallow: /cgi-binThis informs the web bots to index all your site but disallow the cg-bin directory.
If you have a sub directory where you do not want the web bots to index then add a file, again called robots.txt containing:
User-agent: *
Disallow: *This informs the web bots NOT to index pages in the current sub directory.
Related content:
- In addition to the above ensure you don't forget to add the correct meta data on your pages, see Meta tags revisited at: www.seiretto.com/newsletters/
- Receiving unwanted submissions from spam bots using your mail forms then try this:
phpFormMailer
- NEED a boost to get indexed on search engines like Google? Why not get listed on our review pages?
We are particularly interested in reviews from our relatively new Starter hosting accounts and Fully managed dedicated servers, for more details please see: account reviews
Need to know more about robots.txt, try: http://www.robotstxt.org/wc/faq.html