|
|
|
The Google Goal Of Indexing 100 Billion Web Pages
oogle’s Goal of Quality Search To stay true to their goal of providing the best search results In their paper 'The Anatomy of a Google knew right from the start Large-Scale Hypertextual Web that it had to be designed so Search Engine' it is very evident that the search engine could that Google’s goal has always catch up with the web’s growth. been to be one of the best search According to Brin and Page “In engines there is in terms of the designing Google we have quality of the results it gives. considered both the rate of Sergey Brin and Lawrence Page, growth of the Web and however knew that in order to do technological changes. Google is this, Google needed to be able to designed to scale well to store information efficiently and extremely large data sets. It cost effectively and to have makes efficient use of storage excellent crawling, indexing, and space to store the index”. They sorting methods or techniques. knew that they needed much space Google not only aimed to give to store and ever growing index. quality results but to produce the results as fast as possible. Google’s index size, which that Google started as a high quality started out as 24 million web search engine and continues to be pages was large for its time and the best search engine today. It has grown to around 25 billion has managed to stay true to its web pages, still keeping Google original intent to be a search ahead of its competitors. engine that not only crawls and However, Google is a company that indexes the web efficiently but doesn’t settle for just beating also to produce more satisfying the competitors. They truly aim results in comparison to other to give their users the best existing search engines. service there is and that means
as a search engine they want to known as search engines, are able give users access to all or at to index only a small part of the least most of the quality documents available on the information that is available on Internet. According to estimates the web. the existing number of web pages in the Internet as of last year Google’s New System for Indexing was around 200 billion; however, More Pages Patterson claimed that even the best search engine (that is As mentioned earlier, Google aims Google) was able to index only up to give access to even more to 6 to 8 billion web pages. The information and has been devoting disparity between the number of time and much effort to realize indexed pages and existing pages this goal. It seems that the new clearly signaled a need for a new patent entitled 'Multiple Index breed of information retrieval Based Information Retrieval system. Conventional information System' filed by Google employee retrieval systems just weren’t Anna Patterson might be the capable of doing the job and just answer to the problem. The patent wouldn’t be able to index enough published just this May of 2006 web pages to give users access to and filed way back in January of a large enough percentage of the 2005 shows that Google might present existing information actually be aiming to expand available on the web. their index size to as much as a 100 billion web pages or even The Multiple Index Based more. Information Retrieval System, however, is up to the challenge According to the patent, and is Google’s answer to the conventional information problem. Two characteristics of retrieval systems, more commonly the new system makes it stand out
compared to the conventional pages in 1996. By August of 200, systems. One is that it has the Google had managed to quadruple “capability to index an extremely their index size to approximately large number of documents, on the one billion web pages. On order of a hundred billion or September of 2003 Google’s more”. And the other is its front-page boasted and an index capability to “index multiple of 3.3 billion web pages. versions or instances of Microdoc, however, revealed that documents for archiving…enabling the actual number of web pages a user to search for documents Google had indexed during that within a specific range of dates, time was more than five billion and allowing date or version web pages already. In their related relevance information to article 'Google Understates the be used in evaluating documents Size of Its Database', they in response to a search query and emphasized that Google not only in organizing search results.” specialized in simplicity but With the new system developed by also in understating their power Patterson, Google now has the and complexity. Google was still ability to expand its index size managing to stay ahead of its to unbelievable proportions as competitors and continued to well as improve document analysis surprise everyone with what they and processing, document had under their sleeves. annotation, and even the process of ranking according to contained As Google’s index continued to and anchor phrases. grow the number in their front page grew impressively large as History of Google’s Index Size well before it plateaud at eight billion web pages. This was Google started out with an index around the time that Patterson size of around 24 million web filed the new patent. Then in
2005, with controversies in index that the index size was now 1,000 size growing, Google decided to times larger than the original stop counting in front of the index. This pegged their index public and simply claimed that size to around 24 billion web their index size was three times pages, about a fourth of Google’s larger than the nearest goal of indexing a100 billion web competitor’s index size. Google pages. It seems then that Google also maintained that it was not must have started using the new just the size of indexed pages system in mid 2005. With the new that was important but how system in place we can only wait relevant the results they and see how fast Google will returned were. Then in September reach the goal of a 100 billion of 2005, as part of Google’s 7th web pages in its index. It's most anniversary, Anna Patterson, the likely though that when Google same software engineer who filed has reached that goal it would the patent on the Multiple Based set an even higher goal to Index Information Retrieval provide continuous quality System posted an entry on service. Google’s official blog claiming
About the Author:
http://www.theinternetone.net
Read more articles by: Danny Wirken
Article Source: www.iSnare.com |
|