KCA 



Organic - Search Engines





  


home | contact | sitemap

Organic - Search Engines

Google was started by two PhD students at Stanford University, Sergey Brin and Larry Page, and brought a new concept to evaluating web pages. This concept, called PageRank, has been important to the Google algorithm from the start. PageRank relies heavily on incoming links and uses the logic that each link to a page is a vote for that page's value. The more incoming links a page had the more "worthy" it is. The value of each incoming link itself varies directly based on the PageRank of the page it comes from and inversely on the number of outgoing links on that page.

With help from PageRank, Google proved to be very good at serving relevant results. Google became the most popular and successful search engine. Because PageRank measured an off-site factor, Google felt it would be more difficult to manipulate than on-page factors.

However, webmasters had already developed link-manipulation tools and schemes to influence the Inktomi search engine. These methods proved to be equally applicable to Google's algorithm. Many sites focused on exchanging, buying, and selling links on a massive scale. PageRank's reliance on the link as a vote of confidence in a page's value was undermined as many webmasters sought to garner links purely to influence Google into sending them more traffic, irrespective of whether the link was useful to human site visitors.

Further complicating the situation, the default search-bracket was still to scan an entire webpage for so-called related search-words, and a webpage containing a dictionary-type listing would still match almost all searches (except special names) at an even higher priority given by link-rank. Dictionary pages and link schemes could severely skew search results.

It was time for Google -- and other search engines -- to look at a wider range of off-site factors. There were other reasons to develop more intelligent algorithms. The Internet was reaching a vast population of non-technical users who were often unable to use advanced querying techniques to reach the information they were seeking and the sheer volume and complexity of the indexed data was vastly different from that of the early days. Search engines had to develop predictive, semantic, linguistic and heuristic algorithms. Around the same time as the work that led to Google, IBM had begun work on the Clever Project , and Jon Kleinberg was developing the HITS algorithm.

A proxy for the PageRank metric is still displayed in the Google Toolbar, but PageRank is only one of more than 100 factors that Google considers in ranking pages.

Today, most search engines keep their methods and ranking algorithms secret, to compete for finding the most valuable search-results and to deter spampages from clogging those results. A search engine may use hundreds of factors in ranking the listings on its SERPs; the factors themselves and the weight each carries may change continually. Algorithms can differ widely: a webpage that ranks #1 in a particular search engine could rank #200 in another search engine.


The following factors are speculation on some of the considerations search engines may presently be using or which could be built into their algorithms. A number of these are taken from one of Google's patent applications , and may give some indication as to what is in the pipeline. Some are pure speculation. It's also good to keep in mind that Google has over 180 patents and patent applications assigned to them at the US Patent and Trademark Office (USPTO), and a number of those include possible insights into other factors, and other directions that the search engine may follow, some of which may not be consistent with this list.

Age of site
Length of time domain has been registered
Age of content
Frequency of content: regularity with which new content is added
Text size: number of words above 200-250 (not affecting Google in 2005)
Age of link and reputation of linking site
Standard on-site factors
Uniqueness of content
Related terms used in content (the terms the search engine associates as being related to the main content of the page)
Google Pagerank (Only used in Google's algorithm)
External links, the anchor text in those external links and in the sites/pages containing those links
Citations and research sources (indicating the content is of research quality)
Stem-related terms in the search engine's database (finance/financing)
Incoming backlinks and anchor text of incoming backlinks
Negative scoring for some incoming backlinks (perhaps those coming from low value pages, reciprocated backlinks, etc.)
Rate of acquisition of backlinks: too many too fast could indicate "unnatural" link buying activity
Text surrounding outward links and incoming backlinks. A link following the words "Sponsored Links" could be ignored
Use of "rel=nofollow" to suggest that the search engine should ignore the link
Depth of document in site
Metrics collected from other sources, such as monitoring how frequently users hit the back button when SERPs send them to a particular page
Metrics collected from sources like the Google Toolbar, Google AdWords/Adsense programs, etc.
Metrics collected in data-sharing arrangements with third parties (like providers of statistical programs used to monitor site traffic)
Rate of removal of incoming links to the site
Use of sub-domains, use of keywords in sub-domains and volume of content on sub-domains… and negative scoring for such activity
Semantic connections of hosted documents
Rate of document addition or change
IP of hosting service and the number/quality of other sites hosted on that IP
Other affiliations of linking site with the linked site (do they share an IP? have a common postal address on the "contact us" page?)
Technical matters like use of 301 to redirect moved pages, showing a 404 server header rather than a 200 server header for pages that don't exist, proper use of robots.txt
Hosting uptime
Whether the site serves different content to different categories of users (cloaking)
Broken outgoing links not rectified promptly
Unsafe or illegal content
Quality of HTML coding, presence of coding errors
Actual click through rates observed by the search engines for listings displayed on their SERPs
Hand ranking by humans of the most frequently accessed SERPs


GNU Free Documentation License (see Copyrights for details). Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.


Your Turn
Wouldn't you like to have a website built for your business that "does" know how to sell? Contact us today for a free quote and e-business consultation


To request more information, please complete the form below.
 
Name:
Address:
Address (Cont):
City:
State / Province:
Zip / Postal Code:
Phone:
Fax:
Email Address:
Questions/Comments:
   




Alabama - Arizona - Arkansas - California - Colorado - Connecticut - Delaware - Florida - Georgia - Hawaii - Idaho - Illinois
Indiana - Iowa - Kansas - Kentucky - Louisiana - Maine - Maryland - Massachusetts - Michigan -Minnesota - Mississippi
Missouri - Montana - Nebraska - Nevada - New Hampshire - New Jersey - New Mexico - New York - North Carolina - North Dakota
Ohio - Oklahoma - Oregon - Pennsylvania - Rhode Island - South Carolina - South Dakota - Tennessee -  Texas - Utah
Vermont  - Virginia - Washington - West Virginia - Wisconsin - Wyoming



















Keith Cash Associates
• 240 Shady Acres Lane  • Montevallo, AL  35115

205-299-2688 • Email: Contact • Web Site: www.KeithCash.com

Sitemap •  About •  Services Newsletter Blog NewsLinksHome

Copyright © Keith Cash Associates  2005-2006. All rights reserved.

KeithCash.com - Expense reduction - Internet marketing

.