| Home | Articles | Archive | Links |


- SEO
How And Where Search Engines See...
Free SEO Tools You Can t Afford...
Google Infomercials And The Billion...
Getting Back To Basics With SEO
Google Webmaster Guidelines You May...
Googles Drive To Stay Relevant...
Google Who s Information Used For...
How Google Matches Up Adwords...
How Article Marketing Is Helpful...
- Design
Factors To Consider When Hiring...
- Earning Money
eBay Business Longer Bidding Time...
eBay Business Tricky Situations...
How Business Weblogs Really Affect...
- Hosting
Freely Hosted Personal Files On...
Hosting Matters When It Comes...
- Linking
Everything You Need To Know About...
Hotel Web Sites The Missing Link s
- Auction
eBay Seller Opportunity Tip Dont...
Free eBay Secrets Success Is To Sell...
Get Your Free Traffic To Your...
- Misc
Elements Of Great Websites
Easy Web Site Builder Software...
E motional Websites Get Some Feeling...
Flash Web Pages Overkill For...
Find People On The Internet
Free Blogs Electronic Windows To Our...
Five Fantastic Ways To Boost Your...
Free Source Of Traffic And PR
Four Options For Paying Bills...
Generating An Additional Income From...
Getting Your Internet Articles...
Get More Sales From Your Traffic
Getting An Employers Attention On...
Get Rid Of Unsolicited Bulk Email
Got Some Extra Time Take A Paid...
How To Avoid Accepting Credit Cards...
How Can Article Marketing Work For...
How To Analyze Your Website
How To Be Anonymous Online
How Blogging Can Shift The Power...
How Online Advertising Campaigns...

How And Where Search Engines See Duplicate Content



I


ntroduction                     duplicate site of another site.   
                                The crawlers skip the duplicate   
Search engines have become      site to be more efficient and     
the gateway to information in the     save time. Crawler also do this   
Internet. Search engines are so       for another reason – to avoid     
important that websites find that     listing duplicate pages in SERPs  
they need to rank well in search      and thus point users to different 
engine results pages (SERPs) in       sites containing just the same    
order to get noticed. With the        information. Search engines do    
numerous websites vying to get        not like that to happen because   
into the coveted position of the      it would be irritating for users  
top 30 results listed in SERPs        who expect to see different sites 
more and more website owners are      for the different links they      
using search engine optimization      click. For similar sites, search  
(SEO) techniques to improve their     engines also usually just list    
rankings. People who use SEO know     one of the sites and relegate the 
that there are certain factors        others under a link that says See 
that can affect your ranking          related pages. For those that get 
positively and of course              manage to be listed in the SERPs  
negatively. Of the negative           the page rank is still usually    
factors one of the most               affected and so affects the sites 
well-known is duplicate content.      standing.                         
                                                                        
Search engines are biased against     Where Search Engines See          
duplicate content. As a matter of     Duplicate Content                 
fact some sites do not get listed                                       
in SERPs because of this factor.      So where do crawlers see this     
This happens when crawlers do not     duplicate content. And what are   
index sites which they have           the possible content that they    
previously determined to be a         would interpret as duplicate?     



According to an article by                                              
William Slawski on Duplicate          5. Pages that serve session IDs   
Content Issues and Search             to search engines, so that they   
Engines, search engines see           try to crawl and index the same   
duplicate content from the            page under different URLs.        
following kind of web pages:                                            
                                      6. Pages that serve multiple data 
1. Product descriptions from          variables through URLs, so that   
manufacturers, publishers, and        they crawl and index the same     
producers reproduced by a number      page under different URLs.        
of different distributors in                                            
large ecommerce sites.                7. Pages that share too many      
                                      common elements, or where those   
2. Alternative print pages – This     are very similar from one page to 
happens when website owners who       another, including title, meta    
are user friendly offer copies of     descriptions, headings,           
the same documents in different       navigation, and text that is      
formats for a varied printing         shared globally. – This is common 
options. Although helpful to          for company websites that insist  
users it might actually indexed       on having their logo,             
by crawlers as duplicate pages.       description, etc put on every     
                                      page of their website.            
3. Pages that reproduce                                                 
syndicated RSS feeds through a        8. Copyright infringement –       
server side script.                   Plagiarism is of course a good    
                                      reason for not being indexed. The 
4. Canonicalization issues, where     problem is that crawlers cannot   
a search engine may see the same      distinguish the original from the 
page as different pages with          duplicate and might mistakenly    
different URLs.                       filter out the original instead.  



                                      content. The methods in many      
9. Use of the same or very            ways, from the concept, to the    
similar pages on different            algorithms, and of course their   
subdomains or different country       effectiveness. Search engines     
top level domains (TLDs).             are, however, all finding new     
                                      ways to improve their methods for 
10. Article syndication – Some        searching duplicate content as    
writer allow their articles to be     seen by the patents filed by      
published in other websites as        different search engines          
long as they are given credit for     companies like AltaVista,         
their work. The problem arises        Microsoft Corporation, Google,    
when the crawler sees the             and other bodies like the company 
original article as the duplicate     Digital Equipment Corporation and 
and opts to index duplicate page      even the Regents of the           
or at least give it a higher          University of California.         
rating.                                                                 
                                      The different patents include     
11. Mirrored sites – Mirrored         methods for Detecting             
sites are used to handle the          query-specific duplicate          
traffic of a very popular site.       documents, Detecting duplicate    
Mirror sites have a good chance       and near-duplicate files,         
of being ignored by web crawlers      clustering closely resembling     
and so won’t be indexed.              data objects, identifying near    
                                      duplicate pages in a hyperlinked  
How Search Engines See Duplicate      database, indexing duplicate      
Content                               database records using a          
                                      full-record fingerprint, indexing 
There are many methods employed       duplicate records of information  
by different search engines to        of a database, utilizing          
determine pages with duplicate        information redundancy to improve 



text searches and methods and         collaboration between them.       
apparatus for detecting and                                             
summarizing document similarity       Conclusion                        
within large document sets, and                                         
for finding mirrored hosts by         As search engines further refine  
analyzing URLs.                       their methods for detecting       
                                      duplicate content it would be     
Each method is unique and is          harder for plagiarists to get     
interesting in its approach. The      away with what they do. However,  
methods vary greatly from             web pages containing duplicate    
generating fingerprints for           content for a good reason could   
records to using query-relevant       suffer as well. Furthermore since 
information to limit the portion      none of the published patents     
of the documents to be compared.      tackled the issue of              
Discussing each method would be       differentiating the original      
interesting and would shed light      content from the duplicate ones   
as to how different search            refinement in the search engine’s 
engines approach the problem. The     methods might mean further        
new methods are all innovative        trouble for the website owners of 
and if some of them are used in       original content. Because of this 
concert with each other, it would     search engines ought to find ways 
surely improve the search             and invent new methods for        
engine’s ability to detect            identifying original content from 
duplicate documents. However,         duplicate ones as well as valid   
since the patent holders are          duplicate content.                
competing companies, it is            

                              
unlikely that there would be          






About the Author:

http://www.theinternetone.net


Read more articles by: Danny Wirken

Article Source: www.iSnare.com


...Archive >>

Submit Your Site
Recent Articles
  • “Create A Quality Subscriber List That Has Your Competitors Green With Envy: Part Two”

    A quality subscriber list is the heart of your Internet business You want to keep your heart healthy and strong, right You exercise and eat healthy foods (or try to)...

  • “Create A Quality Subscriber List That Has Your Competitors Green With Envy: Part One”

    Have you sent your subscriber list to the Doctor lately for a checkup Ok, not a real Doctor, but you should evaluate your list like a Doctor would evaluate a patient A Doctor wants to help patients so they live a quality life...

  • Search Engine Optimization – Best Way To Be Ahead Of Your Competitors

    As one of the business online owners, you certainly know that you have a lot of competitors Definitely, you want to be on top of your competitors and to have a successful online business This article with give you the best way to achieve it, so read on...

  • Search Engine Optimization Company Questioning

    Search engine optimization company selection is sometimes tricky Aside from price variations, snake oil guarantees, and over emphasizing the importance of meta tags, how can you protect yourself from a search engine optimization scam Search engine optimization companies are popping up everywhere...

  • Top 10 Tips For Finding What You Want On Auction Sites

    If you are a collector or are just looking for a specific item they may be hard to find The growth of online auction sites has made it a lot easier to find what you are looking for, but sometimes the amount of information and categories on auction sites can make it difficult to narrow your search Here are 10 ways to find what you’re looking for fast...

    Copyright (c) 2008 Isnare.com. All rights reserved.

  • Google
    How And Where Search Engines See Duplicate Content