*Note: The article presented here is written by authors not affiliated with hashemian.com.
This site is not responsible for any errors, omissions, or objectionable content.
Exercise care before engaging in business with any companies mentioned in this article.

Go to: /articles/2005/12/02/ for other articles.

WebmasterWorld Dropped by Google as Forum Bans Bots

Publishers, please consider the following 1088 word article about WebmasterWorld Banning Bots from forums which got the well known site dropped by all search engines.

Maintain author byline and copyright information.

YOU MUST SET URL's as hyperlinks. Feel free to use the TEXT TO HTML CONVERTER AT:

www.website101.com/cgi-bin/t2h/Mt2h.cgi

allows you to insert paragraph tags, set headings to bold text and set URL's to clickable links that open in new windows.

Please notify author of intent to publish at: privacynotes.com/cgi-bin/M/msb.cgi?2 ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~

WebmasterWorld Dropped by Google as Forum Bans Bots Mike Banks Valentine Copyright © December, 2005

The top internet forum and best known discussion site for website owners, WebmasterWorld has been dropped entirely from Google! A site with over a million pages seeing over 2 million page views a month just disappeared from search engines! How often have you been searching for the answer to issues affecting your web site when you found a thread in WebmasterWorld forums in the top search results?

Never again will you see WebmasterWorld in search results until this bot ban is reversed.

The following URL actually takes up in the middle of the FOO forum discussion that runs over 40 pages (at the time of this writing) But there is a nice recap of issues that leads the page there recapping much of the previous 23 or so pages of discussion.

www.webmasterworld.com/forum9/9618-1-10.htm

Site owner Brett Tabke is being grilled, toasted and roasted by forum members for requiring logins (and assigning cookies) for all visitors and effectively locking out all search engine spiders. One big issue is lack of effective site search now that you can't use a "site:WebmasterWorld.com" query to find WebMasterWorld info on specific issues with a Google search. Tabke is being slammed for not having an effective site search function in place before getting the site dropped.

WebmasterWorld has been entirely removed from Google after Tabke decided to use robots.txt to block all spiders with a universal blocking of all crawlers.

User-agent: * Disallow: /

He has stated that this is due to rogue bots clogging and slowing site performance, scraping and re-using content and searching for web reputation on individual companies within forum comments. I've a similar problem at my site on a much smaller scale. Crawlers can request pages at excessive rates that slow site performance for visitors. I've instituted a "Crawl-delay" for Yahoo and MSN, but rogue bots don't follow robots.txt instructions. (Google is more polite and requests pages at a more liesurely rate.)

Can't say I completely understand the WebmasterWorld action to ban all bots, or if it will achieve what Tabke is after, but it sure is creating a buzz in search engine circles. Lots of new links to WebmasterWorld will be generated by this extreme action and then, when he turns access to search engine spiders back on from his robots.txt file to get re-indexed he'll be able to see thousands of new links! Many have suggested that this was the plan upon implementing the ban, but somehow I doubt it. Tabke claims it was done in a moment of frustration.

Barry Schwartz of SEO Roundtable interviewed Tabke after his dramatic decision to ban all bots. That interview clarifies much confusion, but still doesn't fully justify the dramatic move that effectively drops over one million pages from Google. www.seroundtable.com/archives/002863.html

Web reputation crawlers are partially at play here as well. Corporations looking for online commentary, both positive and negative to their company, use web reputation services which crawl the web with reputation bots (crawling mostly blogs and news stories) looking for comments about their clients that may harm or help them. This may be of value to those corporations, but it needlessly slows site performance to no advantage for webmasters. If a site owner has trashed a company on their blog, they certainly don't want the "Web Reputation Police" crawling their content in order to sue them for libel.

Rogue bots are a serious problem, but they simply can't be controlled with robots.txt. Tabke said himself that even the cookies and login are useless against serious scraper bots as the bot owner must only manually walk their bots throught the login, which assigns a cookie to it, then let it loose within the forums to automatically continue to scrape away once past the gate. Rogue bots don't follow robots.txt instructions.

I've often wondered why anyone would go to such lengths to steal content and re-use it elsewhere, when it is unlikely to help them in any substantial way. Everyone knows that content is freely available at several article marketing archives, but the rogue bot programmers seek out content that ranks highly first - and fail to realize that there are multiple reasons for those high rankings. Off page factors like quality, relevant, inbound, one-way links from highly ranked blogs and industry news sites. The bad boys out there stealing content won't get those inbound links - OR the high rankings on the sites where they've posted that scraped content.

Article archives experience scraper bots too. Bot programmers would rather write a bot program that collects content for them (and automatically dump it into another site) than to carefully choose relevant work to post in sensible hierarchies of useful content. Automated scrape and dump laziness. What other reasons would you have for scraping free articles?

The other reason for scraping content would be to plaster it up across Adsense and Yahoo Publisher Network (YPN) sites as content to attract advertisements and hope for clickthroughs from visitors seeking valuable keyword phrases that generate ads worth more to those webmasters. This convoluted thinking results in sites that don't end up ranking very well and don't generate much income to those lazy, bot programming, nerds that create those types of sites.

There are several software and cloaking packages available to lazy webmasters that claim to gather keyword-phrase-based content from across the web via bots and scrapers, then publish that content to "mini-webs" automatically, with no work on your part required. Those pages are cloaked automatically, against search engine best practices, and then Adsense and YPN ads are plastered over those automatically created pages, yes, you guessed it - automatically. Serious search engine sp*m, cloaked, so search engines don't know.

One last reason for content scrapers is to find content to use on blogs in the latest craze used to fill those fake blogs (also known as Spam Blogs or Splogs) with content, then ping the blog search services to notify them of new posts. Constant newly added scraped content is added to the blogs and the pinging suggests that the blog is prolific and should be highly ranked. This is closely related and promoted by the above mentioned article scrapers. This is the latest type of spam that is being combatted by search engines. It seems that search engine sp*m is just as serious as emailed sp*m.

Good luck to WebmasterWorld's effort to ban those rogue bots and scrapers!

Mike Banks Valentine operates WebSite101.com Free Web Small Business Ecommerce tutorial and Provides SEO content aggregation, press release optimization and custom web content Search Optimization seoptimism.com/SEO_Contact.htm Web Content Article distribution by Publish101.com

Article Topics
Adsense Advertising Bankruptcy Blog Credit Card
Debt Google Ira Marketing Mortgage
Real Estate Rental Retirement Rss Search Engine
Seo Stocks Tax
Recent Articles
Liked this page? Donate and support the effort.
Bitcoin: 1K9TzBvQ2oaEb4tX9t2vKDtZouMcpfV6QF
Paypal:

Read Financial Markets  |   Home  |   Blog  |   Web Tools  |   News  |   Articles  |   FAQ  |   About  |   Contact

© 2001-2013 Robert Hashemian

hashemian.com
هاشمیان.com

Home
Blog
Web Tools
News
Articles
FAQ
About
Contact
Financial Markets

Visits: Powered by hashemian.com

Search Hashemian.com