Hashemian Blog

Huh? What?

Sunday, April 10, 2005

Google toolbar, exposing hidden web pages? 

A few days ago I had a discussion with our managing editor for our company's Web site about how crawlers discover and index pages. He was convinced that search engines can somehow find hidden pages on a Web site even if there are no links to those pages. I, on the other hand, wouldn't be persuaded. How could search engines crawl a page if they don't know the page's name and location, i.e. its path? Turns out we were both wrong – and right, depending on how you look at it.

In order for search engines to crawl a Web page, they must first be directed to it. The process of page discovery is generally a hyperlink on another page that the crawler can follow. I'm not sure if search engines also follow plain text URLs, but it is a possibility. A site that wants to publicize a new page would normally have links to the new page from other pages, or the page will be in a directory index which lists all files in a directory when accessed (Web sites normally disable this option though for security reasons). In the absence of a link to a Web page's URL, crawlers would have no idea about the existence of that page (referred to as a hidden or orphaned page). I suppose they could engage in name-guessing, but that's an expensive proposition I suspect most search engines shun.

Then a few days ago I ran into an anomaly that disproved my belief about hidden pages and crawler discovery. I was working on a fairly popular page (Browser Simulator/Emulator) on my personal site. Due to the nature of the page, it has the potential of becoming a tool in the hands of abusers, so it is monitored for abusive activity patterns. I began to notice that the page was being accessed excessively by Googlebot with specific parameters as if a human was commandeering the page. Respecting the privacy of users however, I only monitor general patterns on that page, so I didn't have detailed information about Googlebot's activity.

With my curiosity piqued, I constructed a similar but hidden page in the same folder and switched on full monitoring. Then I began hitting the page, entering various data in the form fields. Sure enough, Googlebot began accessing that page with the same data as I had specified. How could Googlebot discover the hidden page so fast (if at all) and specify the same data as I was? A glance near the top of my Internet Explorer browser found the culprit. It was the Google toolbar, the seemingly innocuous toolbar that most people have installed on their browsers and are oblivious to its operation.

I am certain the Google toolbar comes with a privacy disclosure detailing how and what it gleans from the user's activity. I never bothered to read this and chances are most people ignore it as well. I am also not sure what Google does with the data. I suppose they do use it for ranking purposes, but I am now certain that it crawls the pages surfed on by users. I am, however, still unsure whether the crawled pages ever make it to the Google's index to be displayed as search results. I am also unsure if what the browser displays to the users is sent to Google along with the URLs (this could have potentially disastrous privacy repercussions).

There you have it. If you place hidden pages on your Web folders, don't be too confident about their secrecy, even if those pages are only accessed internally by you and a few trusted people. Anyone with a Google toolbar (or any other toolbar such as Alexa or A9) would be unwittingly sending the URLs of those hidden pages to Googlebot (or other robots/spiders), and potentially exposing the location of those pages to the world.
<Google toolbar, exposing hidden web pages?>

0 comments |

0 Comments:

Post a Comment

This page is powered by Blogger. Isn't yours?

Links
  • Syndicate Hashemian.com/blog/
  • Subscribe to Hashemian.com/blog/ with Bloglines
  • Read Hashemian.com/blog/ with Bloglines
  • Subscribe to Hashemian.com/blog/ with My Yahoo!
  • Technorati Profile
  • TMCnet.com
  • ARCHIVES
  • 09/01/2003 - 10/01/2003
  • 03/01/2004 - 04/01/2004
  • 04/01/2004 - 05/01/2004
  • 05/01/2004 - 06/01/2004
  • 06/01/2004 - 07/01/2004
  • 07/01/2004 - 08/01/2004
  • 08/01/2004 - 09/01/2004
  • 09/01/2004 - 10/01/2004
  • 10/01/2004 - 11/01/2004
  • 11/01/2004 - 12/01/2004
  • 12/01/2004 - 01/01/2005
  • 01/01/2005 - 02/01/2005
  • 02/01/2005 - 03/01/2005
  • 03/01/2005 - 04/01/2005
  • 04/01/2005 - 05/01/2005
  • 05/01/2005 - 06/01/2005
  • 06/01/2005 - 07/01/2005
  • 07/01/2005 - 08/01/2005
  • 08/01/2005 - 09/01/2005
  • 09/01/2005 - 10/01/2005
  • 10/01/2005 - 11/01/2005
  • 11/01/2005 - 12/01/2005
  • 12/01/2005 - 01/01/2006
  • 01/01/2006 - 02/01/2006
  • 02/01/2006 - 03/01/2006
  • 03/01/2006 - 04/01/2006
  • 04/01/2006 - 05/01/2006
  • 05/01/2006 - 06/01/2006
  • 06/01/2006 - 07/01/2006
  • 07/01/2006 - 08/01/2006
  • 08/01/2006 - 09/01/2006
  • 09/01/2006 - 10/01/2006
  • 10/01/2006 - 11/01/2006
  • 11/01/2006 - 12/01/2006

  • Read Financial Markets  |   Home  |   Blog  |   Web Tools  |   News  |   Articles  |   FAQ  |   About  |   Contact

    © 2001-2008 Robert Vahid Hashemian
    Support the effort
    Liked this page?
    Please consider creating a link to it
    from your Web site.

    hashemian.com
    هاشمیان.com

     Home

     Blog

     Web Tools Add Free Web Tools custom Google Toolbar button (Requires Toolbar >V4)
    Usage

     News

     Articles

     FAQ

     About

     Contact

     Financial Markets Book
    Read Complete Book



    BOOK
    Freakonomics [Revised and Expanded]: A Rogue Economist Explores the Hidden Side of Everything
    Stephen J. Dubner
    $27.95


    BOOK
    Predictably Irrational: The Hidden Forces That Shape Our Decisions
    Dan Ariely
    $25.95


    BOOK
    Stop the 401(k) Rip-off!: Eliminate Costly Hidden Fees to Improve Your Life
    David B. Loeper
    $15.95


    BOOK
    Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets
    Nassim Nicholas Taleb
    $16.00


    BOOK
    Rich Dad's Advisors®: The ABC's of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss (Rich Dad's Advisors)
    Ken McElroy
    $16.95

    |google-toolbar-exposing-hidden-web|

    more…



    aStore - Hashemian.com on Amazon

    Visits: Powered by hashemian.com

     

     

     

     

     

    Search Hashemian.com





    NEW Within Hannor House, Hidden Sexual Desires
    $18.89
    Ends: Tue Aug 26, 2008 01:11:28 EST


    NEW Within Hannor House, Hidden Sexual Desires
    $30.99
    Ends: Tue Aug 26, 2008 01:11:30 EST


    NEW Hidden Truths of Revelation ...
    $18.89
    Ends: Tue Aug 26, 2008 01:11:33 EST


    NEW Hidden Truths of Revelation ...
    $30.99
    Ends: Tue Aug 26, 2008 01:11:33 EST


    NEW Webkdd 2001 - Mining Web Log Data Across All Cus...
    $47.95
    Ends: Tue Aug 26, 2008 01:11:52 EST

    more…