|
|
|
How And Where Search Engines See Duplicate Content
ntroduction duplicate site of another site. The crawlers skip the duplicate Search engines have become site to be more efficient and the gateway to information in the save time. Crawler also do this Internet. Search engines are so for another reason – to avoid important that websites find that listing duplicate pages in SERPs they need to rank well in search and thus point users to different engine results pages (SERPs) in sites containing just the same order to get noticed. With the information. Search engines do numerous websites vying to get not like that to happen because into the coveted position of the it would be irritating for users top 30 results listed in SERPs who expect to see different sites more and more website owners are for the different links they using search engine optimization click. For similar sites, search (SEO) techniques to improve their engines also usually just list rankings. People who use SEO know one of the sites and relegate the that there are certain factors others under a link that says See that can affect your ranking related pages. For those that get positively and of course manage to be listed in the SERPs negatively. Of the negative the page rank is still usually factors one of the most affected and so affects the sites well-known is duplicate content. standing. Search engines are biased against Where Search Engines See duplicate content. As a matter of Duplicate Content fact some sites do not get listed in SERPs because of this factor. So where do crawlers see this This happens when crawlers do not duplicate content. And what are index sites which they have the possible content that they previously determined to be a would interpret as duplicate?
According to an article by William Slawski on Duplicate 5. Pages that serve session IDs Content Issues and Search to search engines, so that they Engines, search engines see try to crawl and index the same duplicate content from the page under different URLs. following kind of web pages: 6. Pages that serve multiple data 1. Product descriptions from variables through URLs, so that manufacturers, publishers, and they crawl and index the same producers reproduced by a number page under different URLs. of different distributors in large ecommerce sites. 7. Pages that share too many common elements, or where those 2. Alternative print pages – This are very similar from one page to happens when website owners who another, including title, meta are user friendly offer copies of descriptions, headings, the same documents in different navigation, and text that is formats for a varied printing shared globally. – This is common options. Although helpful to for company websites that insist users it might actually indexed on having their logo, by crawlers as duplicate pages. description, etc put on every page of their website. 3. Pages that reproduce syndicated RSS feeds through a 8. Copyright infringement – server side script. Plagiarism is of course a good reason for not being indexed. The 4. Canonicalization issues, where problem is that crawlers cannot a search engine may see the same distinguish the original from the page as different pages with duplicate and might mistakenly different URLs. filter out the original instead.
content. The methods in many 9. Use of the same or very ways, from the concept, to the similar pages on different algorithms, and of course their subdomains or different country effectiveness. Search engines top level domains (TLDs). are, however, all finding new ways to improve their methods for 10. Article syndication – Some searching duplicate content as writer allow their articles to be seen by the patents filed by published in other websites as different search engines long as they are given credit for companies like AltaVista, their work. The problem arises Microsoft Corporation, Google, when the crawler sees the and other bodies like the company original article as the duplicate Digital Equipment Corporation and and opts to index duplicate page even the Regents of the or at least give it a higher University of California. rating. The different patents include 11. Mirrored sites – Mirrored methods for Detecting sites are used to handle the query-specific duplicate traffic of a very popular site. documents, Detecting duplicate Mirror sites have a good chance and near-duplicate files, of being ignored by web crawlers clustering closely resembling and so won’t be indexed. data objects, identifying near duplicate pages in a hyperlinked How Search Engines See Duplicate database, indexing duplicate Content database records using a full-record fingerprint, indexing There are many methods employed duplicate records of information by different search engines to of a database, utilizing determine pages with duplicate information redundancy to improve
text searches and methods and collaboration between them. apparatus for detecting and summarizing document similarity Conclusion within large document sets, and for finding mirrored hosts by As search engines further refine analyzing URLs. their methods for detecting duplicate content it would be Each method is unique and is harder for plagiarists to get interesting in its approach. The away with what they do. However, methods vary greatly from web pages containing duplicate generating fingerprints for content for a good reason could records to using query-relevant suffer as well. Furthermore since information to limit the portion none of the published patents of the documents to be compared. tackled the issue of Discussing each method would be differentiating the original interesting and would shed light content from the duplicate ones as to how different search refinement in the search engine’s engines approach the problem. The methods might mean further new methods are all innovative trouble for the website owners of and if some of them are used in original content. Because of this concert with each other, it would search engines ought to find ways surely improve the search and invent new methods for engine’s ability to detect identifying original content from duplicate documents. However, duplicate ones as well as valid since the patent holders are duplicate content. competing companies, it is unlikely that there would be
About the Author:
http://www.theinternetone.net
Read more articles by: Danny Wirken
Article Source: www.iSnare.com |
|