Hashemian Blog

Huh? What?

Sunday, April 10, 2005

Google toolbar, exposing hidden web pages? 

A few days ago I had a discussion with our managing editor for our company's Web site about how crawlers discover and index pages. He was convinced that search engines can somehow find hidden pages on a Web site even if there are no links to those pages. I, on the other hand, wouldn't be persuaded. How could search engines crawl a page if they don't know the page's name and location, i.e. its path? Turns out we were both wrong – and right, depending on how you look at it.

In order for search engines to crawl a Web page, they must first be directed to it. The process of page discovery is generally a hyperlink on another page that the crawler can follow. I'm not sure if search engines also follow plain text URLs, but it is a possibility. A site that wants to publicize a new page would normally have links to the new page from other pages, or the page will be in a directory index which lists all files in a directory when accessed (Web sites normally disable this option though for security reasons). In the absence of a link to a Web page's URL, crawlers would have no idea about the existence of that page (referred to as a hidden or orphaned page). I suppose they could engage in name-guessing, but that's an expensive proposition I suspect most search engines shun.

Then a few days ago I ran into an anomaly that disproved my belief about hidden pages and crawler discovery. I was working on a fairly popular page (Browser Simulator/Emulator) on my personal site. Due to the nature of the page, it has the potential of becoming a tool in the hands of abusers, so it is monitored for abusive activity patterns. I began to notice that the page was being accessed excessively by Googlebot with specific parameters as if a human was commandeering the page. Respecting the privacy of users however, I only monitor general patterns on that page, so I didn't have detailed information about Googlebot's activity.

With my curiosity piqued, I constructed a similar but hidden page in the same folder and switched on full monitoring. Then I began hitting the page, entering various data in the form fields. Sure enough, Googlebot began accessing that page with the same data as I had specified. How could Googlebot discover the hidden page so fast (if at all) and specify the same data as I was? A glance near the top of my Internet Explorer browser found the culprit. It was the Google toolbar, the seemingly innocuous toolbar that most people have installed on their browsers and are oblivious to its operation.

I am certain the Google toolbar comes with a privacy disclosure detailing how and what it gleans from the user's activity. I never bothered to read this and chances are most people ignore it as well. I am also not sure what Google does with the data. I suppose they do use it for ranking purposes, but I am now certain that it crawls the pages surfed on by users. I am, however, still unsure whether the crawled pages ever make it to the Google's index to be displayed as search results. I am also unsure if what the browser displays to the users is sent to Google along with the URLs (this could have potentially disastrous privacy repercussions).

There you have it. If you place hidden pages on your Web folders, don't be too confident about their secrecy, even if those pages are only accessed internally by you and a few trusted people. Anyone with a Google toolbar (or any other toolbar such as Alexa or A9) would be unwittingly sending the URLs of those hidden pages to Googlebot (or other robots/spiders), and potentially exposing the location of those pages to the world.
<Google toolbar, exposing hidden web pages?>

0 comments |

0 Comments:

Post a Comment

This page is powered by Blogger. Isn't yours?

Links
  • Syndicate Hashemian.com/blog/
  • Subscribe to Hashemian.com/blog/ with Bloglines
  • Read Hashemian.com/blog/ with Bloglines
  • Subscribe to Hashemian.com/blog/ with My Yahoo!
  • Technorati Profile
  • TMCnet.com
  • ARCHIVES
  • 09/01/2003 - 10/01/2003
  • 03/01/2004 - 04/01/2004
  • 04/01/2004 - 05/01/2004
  • 05/01/2004 - 06/01/2004
  • 06/01/2004 - 07/01/2004
  • 07/01/2004 - 08/01/2004
  • 08/01/2004 - 09/01/2004
  • 09/01/2004 - 10/01/2004
  • 10/01/2004 - 11/01/2004
  • 11/01/2004 - 12/01/2004
  • 12/01/2004 - 01/01/2005
  • 01/01/2005 - 02/01/2005
  • 02/01/2005 - 03/01/2005
  • 03/01/2005 - 04/01/2005
  • 04/01/2005 - 05/01/2005
  • 05/01/2005 - 06/01/2005
  • 06/01/2005 - 07/01/2005
  • 07/01/2005 - 08/01/2005
  • 08/01/2005 - 09/01/2005
  • 09/01/2005 - 10/01/2005
  • 10/01/2005 - 11/01/2005
  • 11/01/2005 - 12/01/2005
  • 12/01/2005 - 01/01/2006
  • 01/01/2006 - 02/01/2006
  • 02/01/2006 - 03/01/2006
  • 03/01/2006 - 04/01/2006
  • 04/01/2006 - 05/01/2006
  • 05/01/2006 - 06/01/2006
  • 06/01/2006 - 07/01/2006
  • 07/01/2006 - 08/01/2006
  • 08/01/2006 - 09/01/2006
  • 09/01/2006 - 10/01/2006
  • 10/01/2006 - 11/01/2006
  • 11/01/2006 - 12/01/2006

  • Read Financial Markets  |   Home  |   Blog  |   Web Tools  |   News  |   Articles  |   FAQ  |   About  |   Contact

    © 2001-2009 Robert Vahid Hashemian
    Support the effort
    Liked this page?
    Please consider creating a link to it
    from your Web site.

    hashemian.com
    هاشمیان.com

     Home

     Blog

     Web Tools Add Free Web Tools custom Google Toolbar button (Requires Toolbar >V4)
    Usage

     News

     Articles

     FAQ

     About

     Contact

     Financial Markets Book
    Read Complete Book



    BOOK
    Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (P.S.)
    Stephen J. Dubner
    $15.99


    BOOK
    Born to Run: A Hidden Tribe, Superathletes, and the Greatest Race the World Has Never Seen
    Christopher McDougall
    $24.95


    BOOK
    Predictably Irrational, Revised and Expanded Edition: The Hidden Forces That Shape Our Decisions
    Dan Ariely
    $27.99

    BOOK
    Jesus, Interrupted: Revealing the Hidden Contradictions in the Bible (And Why We Don't Know About Them)
    Bart D. Ehrman
    $25.99


    BOOK
    Power vs. Force: The Hidden Determinants of Human Behavior
    David R. Hawkins
    $14.95

    |google-toolbar-exposing-hidden-web|

    more…




    Get Kindle, $259

    aStore - Hashemian.com on Amazon

    Visits: Powered by hashemian.com

     

     

     

     

     

    Search Hashemian.com





    EXCLUSIVE DESIGN,HIDDEN TUMMY Dress,SIZE 22(22-24),10C
    $54.79
    Ends: Wed Nov 25, 2009 06:23:58 EST


    KNEE HIGH VINTAGE BOOTS HIDDEN PLATFORM HEEL black 39
    $36.52
    Ends: Wed Nov 25, 2009 06:26:54 EST


    Web Design Suite - Dreameaver & Photoshop Compatible
    $8.22
    Ends: Wed Nov 25, 2009 06:27:17 EST


    EXCLUSIVE DESIGN,HIDDEN TUMMY Dress,SIZE 22(22-24),8U
    $36.52
    Ends: Wed Nov 25, 2009 06:30:31 EST


    30FPS SPY Car Key Micro hidden Camera 720×480 DVR 16GB
    $0.01
    Ends: Wed Nov 25, 2009 06:46:47 EST

    more…