rustRank is an attempt to then used in propagating trust
counter the web spamming through the link structure of the
activities that threatens web.
to deceive search engines'
ranking algorithms. It propagates TrustRank hopes to use a set of
trust among web pages in the same highly trusted seed sites to help
manner that PageRank propagates in demoting web spam. The
authority. However, tests would approach assigns a non-zero
show that the combination of initial trust score to these seed
trust and distrust values have sites while assigning initial
greater ability to demote spam values of zero to all other
sites than with the use of trust sites. A biased PageRank
values alone. algorithm is used to propagate
these initial trust scores to the
The Assumption outgoing sites where good sites
are expected to get a decent
A link between two pages holds an trust score while spam sites are
implied conveyance of trust likely to get lower trust scores
emanating from the source page to after convergence.
the target page. Pointing to a
link is a vote of confidence from The possibility of a page
the source that the target is pointing to a spam page increases
able to provide content that will as the number of links increases.
be of value to the user. It It has been proposed that the
basically revolves around the trust score of a parent page be
ideal set-up that good sites only equally split among the children
point to similarly good sites and pages. There is the question as
will not knowingly refer people to the logic of having different
to spam sites. These good sites trust scores for children pages
hold the trust of people which is in cases of multiple parent
pages. TrustRank provides a web pages.
solution by simple summation
which has been not quite The seed sets used may not be
effective in curtailing the spam able to sufficiently represent
site's efforts to raise their the different topics of the web.
ranking. TrustRank tend to show a bias
towards larger communities which
The conveyance of distrust can be remedied by the use of
emerged as a natural extension of topical information to divide the
the conveyance of trust between seed set and calculate trust
links. Distrust may be an scores separately for each topic.
indication of lack of confidence The use of the pages listed in
to a source page due to its well-maintained topic directories
linkage to an untrustworthy page. can help in resolving the
Thus, when a link with a known coverage issue. Seed filtering
spam page is established, the may be done to remove low quality
trust judgment of the source page pages or even spam pages that may
cannot be considered valid. inadvertently been included in
the pool of seed pages.
TrustRank as it was originally
conceived, proposed that trust Much work is being done to come
should be reduced as we move up with methods that don't rely
further away from the seed set of heavily on human judgment for
trusted pages. However, the identification of spam free
limited number of seed pages pages. As it is, searchers are
makes it impossible for the whole highly challenged to locate pages
web to be touched by propagation. that would serve their needs and
A well performing algorithm is not those that are intended for
needed to produce trust judgments high ranking in search engines.
at least for a larger fraction of Sites that do not provide any
value to users are just too many search. It is typical for users
to be ignored. to view just one page of results
thus sites are hard put to
Semantic Cloaking on the Web compete for the top rankings
particularly for popular queries.
Semantics is the study or science Increased traffic to a commercial
of meaning in language that takes website is equivalent to more
words and compares them with profit.
other words or symbols and
determines the relevancy and Reputable content providers work
relationship between them. hard to come up with high quality
Semantic cloaking is the practice web pages to get their desired
of supplying different versions high ranking. Unfortunately, not
of a web page to search engines all content providers hold the
and to browsers. The purpose of same view. These are the people
the content provider is to hide that would try to reach high
the real content of the page from ranking through manipulation of
the view of search engines. The web page features used by search
difference in meaning between the engines as basis for their
pages is supposed to deceive ranking algorithms.
search engines' ranking
algorithms. Cloaking is one type Ranking algorithms assumes that
of search engine spamming page content is real. This means
technique that makes it possible that the content seen by search
for non-relevant pages to occupy engines is identical to that seen
top ranking in searches. by actual users with browsers.
With the use of the web spamming
Search engines are used by people technique of cloaking, different
when they need to find the most versions are successfully
relevant responses to their supplied causing a big amount of
confusion and disappointment for change in the page. The cloaking
users. behavior that needs to be
penalized is the semantic
Cloaking falls under the cloaking.
page-hiding spam category in
search engine spamming There are various proposals on
techniques. Some cloaking ways to counter the problem. One
behavior is considered proposal suggests the comparison
acceptable. Cloaking is of two of copies from both the browser's
types - syntactic and semantic. perspective and the crawler's
Syntactic cloaking includes all perspective. It may be necessary
situations in which different to get two or more copies from
content is sent to a crawler and each side to be able to detect
real user. Semantic cloaking is cloaking. Another suggests a
an offshoot of syntactic cloaking two-step process that would
which employs differences in require fewer resources. The
meaning between pages to deceive first step implements a filter by
the ranking algorithms of search use of heuristics to eliminate
engines. web pages that cannot demonstrate
cloaking. All the pages that have
Syntactic cloaking may be not been eliminated will go
acceptable in cases such as web through the second step for
servers using session identifiers inspection. Features are
within URLs for copies sent to extracted from about four copies
browser and no such identifiers and a classifier is used to
for copies sent to crawlers. This determine whether semantic
is in effect being used by web cloaking is being done or not.
servers to differentiate their However, the reality remains that
users. Search engines may no ideal solution has been
interpret these identifiers as a arrived at to effectively curb
semantic cloaking. This is a undermine the search engine's
technique that should not be attempts to provide users with
practiced by anyone who wants to the actual information they need.
maintain good business ethics.
The practice continues to
About the Author:
http://www.theinternetone.net
Read more articles by:
Danny Wirken
Article Source: www.iSnare.com