Dot .info Domains as Search Engine Sp*mmer Honeypots?
Please consider this 1150 word article about Dot Info
Domains as the latest Search Engine Spamming Domain.
Sleazy techniques used for Adsense and YPN sites.
Make links live online and maintain the author resource
box and copyright information.
Please notify author of intent to publish at:
privacynotes.com/cgi-bin/M/msb.cgi?2
Consider using the Text to HTML converter at
www.website101.com/cgi-bin/t2h/Mt2h.cgi to embed
links, insert paragraph tags and format for your website.
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
Dot .info Domains as Search Engine Sp*mmer Honeypots?
Copyright © February 20, 2006 by Mike Banks Valentine
Dot info (.info) domain TLD's appear to be the new domain of
search engine spammers since there is an apparent lack of
Google aging delay to list and rank them. They are indexed
relatively quickly after first crawl by the search engines
and are ranking well for some competitive terms. The sleaze
monsters among search engine sp*mmers are using software to
automate four separate areas, content gathering, article
creation, article distribution and blog posting. Some may be
using all four techniques in concert in an effort to blanket
hundreds of sites with article content in order to slap up
Google Adsense or Yahoo Publisher Network Ads.
Various types of thieving goes on in this seamy underbelly of
automated search engine sp*m. Recently, "pre-loaded" content
sites are being sold by a software developer with articles
built-in to sites covering 150 topic areas for $100, or at
$10 for individual topics, allowing setting up "Adsense
Ready" article sites containing keyword focused content
categories obtained from "Free-to-use" articles sites,
against clearly posted author terms of use.
Those usage terms posted by authors and on article
distribution sites universally prohibit use of those
"free-to-use" articles in paid compilations, membership sites
or any "for-profit" collections. Some authors are expanding
their terms of use to exclude usage by specific networks.
Previous slime merchants have avoided copyright lawsuits by
giving away those articles with paid software purchases. I'd
be surprised if authors didn't find some way to band together
to sue those who abuse their terms of use in this way.
Authors have worried over "duplicate content penalties" when
their articles are distributed for use by other web sites.
It's extremely unlikely that this type of use will lead to
penalties for the author web site, linked from resource boxes
of those articles of original content. The likely application
of duplicate content penalties comes, interestingly when used
in exactly the same way by those clueless purchasers of
"pre-loaded" sites with precisely duplicated site structure
and precisely the same articles AND RSS feeds that won't
vary. Those that use these mirrored sites are the ones that
will suffer that duplicate content penalty as they are
mirrored sites, which have been filtered for years. Lazy
buyers of "pre-loaded" articles sites will be the only ones
to receive penalties from the search engines.
In another slimy aspect of this odd netherworld of search
engine spam, article gathering site crawlers use IP spoofing
which imitate search engine IP addresses to hide themselves
within routine traffic on those sites they crawl, trolling
the web looking for articles to steal and use in splogs and
pre-loaded web site kits. These crawlers hit pages slowly
seeking sitemaps or author index pages, grab URL's to return
later under different IP's and pound away at 10 pages per
second or more, grabbing articles from major sites against
posted terms of use on those sites. The crawlers usually
belong to hosted services which then sell this stolen content
to automated article content site subscribers after running
it through new article regurgitating software.
This sleazy article theft software product, which takes
already written copyrighted articles by other authors,
re-orders paragraphs, swaps out interchangable verbs,
rearranges sentences and spits out a fairly readable, and
sometimes passable article which may not be recognizable to
original authors. These stolen, regurgitated articles are
then submitted to article banks and distribution sites by
splog creators, sometimes using automated submission software
or hosted services, so those stolen, regurgitated articles
are used across the web to create inbound links leading to
the search engine sp*m sites.
Many of these .info domain owners are using sleazy sp*m blog
software to create what has become known as "splogs" which
use multiple blogging platforms to create automatically
updated blogs with posts made regularly in some random time
sequencing. They do this to appear to be active bloggers,
using automation built into their software, to create keyword
focused posts via RSS feeds coming from keyword phrase
centered news searches and then "ping" the blog search
engines with new automated posts. Depending on the
sophistication of the splog owner, you'll often see footer
links leading to other splogs they operate on separate
topics.
Virtually all of the .info domains I've seen ranking in top
results for competitive phrases are entirely Adsense or YPN
sites - including splogs, full of autogenerated RSS news
feeds and on-the-fly generated title tags and H1 tags based
on the search phrase used to find the site. Even the
copyright information in the footer of some of these sites is
generated on-the-fly to match search queries. While this
technique is also being used by some search engine sp*mming
.com sites (older than 1 year since creation to avoid aging
delay) it can be seen in more .info domains currently.
If Google is truly ranking sites based on clickstream data,
imagine the abuses these dynamic spam sites, full of nothing
but RSS feeds or stolen, regurgitated content could spawn!
Soon they would rule the results pages because they reflect
EXACTLY the search terms used by the searchers, which leads
to higher click-through ratios, which generates higher
rankings. I see a serious hole for abuse here and hope that
the PhD's at Google work out a filter for the technique fast.
This exact match landing page idea is used widely in
pay-per-click campaigns as most savvy SEM specialists highly
recommend landing pages which reflect exact matches to user
clicks because it leads to higher conversion ratios. Perhaps
a programmer who spends his days creating PPC landing page
scripts is spending his nights creating .info domains with
dynamic page title and metadata for competitive search
phrases to rule organic SEO?
Of course, whois ownership information is masked by many
recent .info domain owners, since those domains were
purchased specifically for se-sp*mming sites. When looking up
the whois information on highly ranking .info domains to
check creation (purchase) dates, you'll see a preponderance
of October through December 2005 creation dates, with a
smattering of January 2006 created sites for those well
ranked splogs. This must be about the time that spammer
forums started noticing and discussing the lack of aging
delay for .info domains.
Whois information for dot com (.com) sites ranking well for
competitive searches shows that ALL are over a year old and
most are 3 to 5 years since creation date.
All of this suggests clear algorithmic aging filters used for
all domains *except* .info domains and the apparent lack of
.info filtering, allowing bypassing the so-called "sandbox
effect" which delays indexing and ranking of other TLD's. My
belief is that Google is using this lack of aging delay and
lack of filtering of .info domains as a honeypot for search
engine sp*m to gather the bad boys all in one otherwise
rarely used TLD and then do wide sweeps, tracing their
tactics to further filter (forgive me for using the term)
Black Hat SEO techniques.
Mike Banks Valentine blogs on Search Engine developments from
RealitySEO.com and can be contacted for ethical SEO
work at: www.seoptimism.com/SEO_Contact.htm He runs
web content distribution site at: Publish101.com
|