Tuesday, 24 February 2009

Searching the 'Deep Web'

An article in the New York Times reviews the issue of the 'Deep Web' - sometimes known as the 'Invisible Web' - and the difficulties for search engines to find this information. Despite Google claiming to now index over one trillion web pages, this is still believed to represents just a fraction of the entire web, since there is much more content that lies beyond the reach of search engines - such as database information, content controlled by login access, financial information, shopping catalogues, medical research, transport timetables and more.

The report focuses on a number of new search and index technologies that are trying to improve this coverage of the web's hidden content, such as Kosmix and DeepPeep. The former service, for example, has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources. If tools such as this do manage to delve deeper into the web's content, the quality and application of search results will be greatly expanded and, as the article claims, could ultimately reshape the way many companies do business online.

Labels: ,


Post a Comment

Subscribe to Post Comments [Atom]

<< Home