Warning: main(http://www.thestandard.com/movabletype/templates/header.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 7

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/header.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 7

Warning: main(http://www.thestandard.com/movabletype/templates/top_part.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 104

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/top_part.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 104

Warning: main(http://www.thestandard.com/movabletype/templates/logobox_opinion.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 105

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/logobox_opinion.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 105

Warning: main(http://www.thestandard.com/movabletype/templates/leftcolumn.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 120

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/leftcolumn.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 120

  Archive

Recent Entries:
Here's Why Google Should Buy Technorati
Why Did Google Let The "Google Store" Trademark Die?
Don't "Hitchhike"A Ride On Google's Trademarks and URLs
Washington Post Has It Wrong: Streaming Video Files Are Not That Big A "Search Challenge"
Library Books Are OK, But Here's What Google Should Really Be Concentrating On


Warning: main(http://www.thestandard.com/movabletype/templates/syndicate.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 157

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/syndicate.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 157


The Industry Standard: Guest Blog: Russell Shaw



Library Books Are OK, But Here's What Google Should Really Be Concentrating On

Google's p.r. apparatus, the professional librarian community, as well as many mainstream journalistic outlets, are are falling all over themselves hailing the announcement that within the next several months, the search engine giant would begin scanning in and indexing mostly non-copyrighted content from several leading university libraries.

I suppose if I were researching Chaucer, I would be more excited than I am. Graduate humanities and literature students, plus the term paper mills that serve at least a few of them must be licking their chops at the near-term availability of this collected knowledge.

To me, the effort - while a laudible nod to the wordsmiths of the past - has two basic flaws. First, as Search Engine Watch's Danny Sullivan and others have commented, valuable search results gleaned from the enhanced database of library holdings will be included below citations from existing sources. And, as eWeek's David Coursey writes, too much overtly commercial content already appears in Google's search results. Blame that on batallions of Search Engine Optimization experts who have learned to pirouette around Google's once-vaunted algos.

To me, though, there is a deeper problem. Much deeper. That's one of misapplied resources.

Click-to-Chaucer is just fine, but that's not where most of the world's vital digital information is. Such data is located in the hidden and the opaque Web. And Google has not yet shown the collective drive yet to tackle the problem.

"Hidden" and "opaque" are terms used to describe Web pages that are spawned on the fly from database queries, and/or are hidden behind the CGI'd, Session ID-URL'd moats of password-protected and fee-accessed Web sites.

BrightPlanet, a company that catalogs and provides some access to "Deep Web" sites, estimates that some 85 billion documents are either contained or can be created on the 60 largest of these sites. That's only 60 sites, but BrightPlanet's sister site, CompletePlanet, lists more than 70,000 such searchable databases.

Even on those 60 sites, 85 billion docs is more than 10 times Google's current public Web count of 8,058,044,651 Web pages.

What's more, those 60 sites contain some real vital info. By some pointing and clicking, you can obtain all manner of climate data, research a patent, locate a part for a compressor, find cancer-related drug trials, research a court case, read a newspaper article from 14 months ago, or pull down a financial analyst report.

All these, and so many more, resources, are much more valuable to most of us, than one-click digital access to dead poets. This is the data that makes our economy, indeed our entire infrastructure, run.

True, Google points to some of these sites. But pointing is little better than indexing. With pointing, you either have to trust the content indexed in the search results to point you to a site where this information is available. Then, you actually have to visit each site that may or may not hold promise, and try them out one by one.

If you are doing real world, not dead poet, research, that can be a time-consuming bear. Take it from someone who knows.

So what I am I proposing? One of two approaches, neither of which are mutually exclusive.

First, Google should start an affiliate program with these Hidden Web sites. Let Google provide the interface to have its site visitors burrow in and actually get to the search boxes that hold the key to all this knowledge. Then, provide Google with the ability to serve up an abstract or a summary of the document. This could be done by crawling the first sentence or two of the document. Then, if money is required for access, Google could collect a kind of a finder's fee for taking searchers that far.

Perfect brand name: Google Database.

Second, Google should buy BrightPlanet. This VC-backed firm has seven patents pending. That tells me that the firm has some mean and lean Deep Web-burrowing brainpower. They couldn't be that expensive to swallow whole. Considering Google's market cap, that would be like a whale swallowing a cuttlefish.




Posted by Russell Shaw, December 17, 2004 03:52 AM | | TrackBack

Warning: main(http://www.thestandard.com/movabletype/templates/rightcolumn.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 258

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/rightcolumn.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 258

Warning: main(http://www.thestandard.com/movabletype/templates/footer.php): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 269

Warning: main(): Failed opening 'http://www.thestandard.com/movabletype/templates/footer.php' for inclusion (include_path='.:/usr/local/lib/php') in /mnt/netappfas270a/redesign/infoworld/industrystandard/geeklog2/public_html/movabletype/russellshaw/archives/003640.php on line 269