WebmasterWorld Dropped by Google as Forum Bans Bots
Publishers, please consider the following 1088 word article
about WebmasterWorld Banning Bots from forums which
got the well known site dropped by all search engines.
Maintain author byline and copyright information.
YOU MUST SET URL's as hyperlinks.
Feel free to use the TEXT TO HTML CONVERTER AT:
www.website101.com/cgi-bin/t2h/Mt2h.cgi
allows you to insert paragraph tags, set headings to bold
text and set URL's to clickable links that open in new
windows.
Please notify author of intent to publish at:
privacynotes.com/cgi-bin/M/msb.cgi?2
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
WebmasterWorld Dropped by Google as Forum Bans Bots
Mike Banks Valentine Copyright © December, 2005
The top internet forum and best known discussion site for
website owners, WebmasterWorld has been dropped entirely from
Google! A site with over a million pages seeing over 2 million
page views a month just disappeared from search engines! How
often have you been searching for the answer to issues
affecting your web site when you found a thread in
WebmasterWorld forums in the top search results?
Never again will you see WebmasterWorld in search results
until this bot ban is reversed.
The following URL actually takes up in the middle of the FOO
forum discussion that runs over 40 pages (at the time of this
writing) But there is a nice recap of issues that leads the
page there recapping much of the previous 23 or so pages of
discussion.
www.webmasterworld.com/forum9/9618-1-10.htm
Site owner Brett Tabke is being grilled, toasted and roasted
by forum members for requiring logins (and assigning cookies)
for all visitors and effectively locking out all search engine
spiders. One big issue is lack of effective site search now
that you can't use a "site:WebmasterWorld.com" query to
find WebMasterWorld info on specific issues with a Google
search. Tabke is being slammed for not having an effective
site search function in place before getting the site dropped.
WebmasterWorld has been entirely removed from Google
after Tabke decided to use robots.txt to block all spiders
with a universal blocking of all crawlers.
User-agent: *
Disallow: /
He has stated that this is due to rogue bots clogging and
slowing site performance, scraping and re-using content and
searching for web reputation on individual companies within
forum comments. I've a similar problem at my site on a much
smaller scale. Crawlers can request pages at excessive rates
that slow site performance for visitors. I've instituted a
"Crawl-delay" for Yahoo and MSN, but rogue bots don't follow
robots.txt instructions. (Google is more polite and requests
pages at a more liesurely rate.)
Can't say I completely understand the WebmasterWorld action to
ban all bots, or if it will achieve what Tabke is after, but
it sure is creating a buzz in search engine circles. Lots of
new links to WebmasterWorld will be generated by this extreme
action and then, when he turns access to search engine spiders
back on from his robots.txt file to get re-indexed he'll be
able to see thousands of new links! Many have suggested that
this was the plan upon implementing the ban, but somehow I
doubt it. Tabke claims it was done in a moment of frustration.
Barry Schwartz of SEO Roundtable interviewed Tabke after his
dramatic decision to ban all bots. That interview clarifies
much confusion, but still doesn't fully justify the dramatic
move that effectively drops over one million pages from
Google. www.seroundtable.com/archives/002863.html
Web reputation crawlers are partially at play here as well.
Corporations looking for online commentary, both positive and
negative to their company, use web reputation services which
crawl the web with reputation bots (crawling mostly blogs and
news stories) looking for comments about their clients that
may harm or help them. This may be of value to those
corporations, but it needlessly slows site performance to no
advantage for webmasters. If a site owner has trashed a
company on their blog, they certainly don't want the "Web
Reputation Police" crawling their content in order to sue them
for libel.
Rogue bots are a serious problem, but they simply can't be
controlled with robots.txt. Tabke said himself that even the
cookies and login are useless against serious scraper bots as
the bot owner must only manually walk their bots throught the
login, which assigns a cookie to it, then let it loose within
the forums to automatically continue to scrape away once past
the gate. Rogue bots don't follow robots.txt instructions.
I've often wondered why anyone would go to such lengths to
steal content and re-use it elsewhere, when it is unlikely to
help them in any substantial way. Everyone knows that content
is freely available at several article marketing archives,
but the rogue bot programmers seek out content that ranks
highly first - and fail to realize that there are multiple
reasons for those high rankings. Off page factors like
quality, relevant, inbound, one-way links from highly ranked
blogs and industry news sites. The bad boys out there stealing
content won't get those inbound links - OR the high rankings
on the sites where they've posted that scraped content.
Article archives experience scraper bots too. Bot programmers
would rather write a bot program that collects content for them
(and automatically dump it into another site) than to
carefully choose relevant work to post in sensible hierarchies
of useful content. Automated scrape and dump laziness. What
other reasons would you have for scraping free articles?
The other reason for scraping content would be to plaster it
up across Adsense and Yahoo Publisher Network (YPN) sites as
content to attract advertisements and hope for clickthroughs
from visitors seeking valuable keyword phrases that generate
ads worth more to those webmasters. This convoluted thinking
results in sites that don't end up ranking very well and don't
generate much income to those lazy, bot programming, nerds
that create those types of sites.
There are several software and cloaking packages available to
lazy webmasters that claim to gather keyword-phrase-based
content from across the web via bots and scrapers, then
publish that content to "mini-webs" automatically, with no
work on your part required. Those pages are cloaked
automatically, against search engine best practices, and then
Adsense and YPN ads are plastered over those automatically
created pages, yes, you guessed it - automatically. Serious
search engine sp*m, cloaked, so search engines don't know.
One last reason for content scrapers is to find content to
use on blogs in the latest craze used to fill those fake blogs
(also known as Spam Blogs or Splogs) with content, then ping
the blog search services to notify them of new posts. Constant
newly added scraped content is added to the blogs and the
pinging suggests that the blog is prolific and should be
highly ranked. This is closely related and promoted by the
above mentioned article scrapers. This is the latest type of
spam that is being combatted by search engines. It seems that
search engine sp*m is just as serious as emailed sp*m.
Good luck to WebmasterWorld's effort to ban those rogue bots
and scrapers!
Mike Banks Valentine operates WebSite101.com Free Web
Small Business Ecommerce tutorial and Provides SEO content
aggregation, press release optimization and custom web content
Search Optimization seoptimism.com/SEO_Contact.htm
Web Content Article distribution by Publish101.com
|