Wednesday, February 9, 2011

How Google makes you feel lucky (Contd..)


My previous blog post dealt with how Google (or any other search engine in particular) indexes information. When working on a search query, there might be a number of documents wherein the search query might be located. But the question that is yet to be answered is who (or what) decides which results are more accurate than the others. With the multitude of information that search engines deal with, this task is more like finding out a needle from a haystack! And unless Search Engines don’t do this efficiently they are going to lose customers.
Larry Brin & Sergey Page had this in mind while they wrote their research paper on Google. They devised a clever algorithm that ranked pages in accordance to the links pointing to that page. It was named Pagerank after Sergey Page and also the function that it performed.
Google always affirms that its ranking process is completely democratic. Speaking of democratic, I can’t help but remember the election process. A number of suitable candidates are to be voted for and every voter casts a vote in the form a secret ballot. This is exactly how Pagerank works. Every hyperlink pointing to a webpage counts as a positive vote to it and helps increase its Pagerank. If there are no links to a webpage obviously its Pagerank is 0. But this isn’t that simple.
Let’s say you have your blog & it has 10 incoming links pointing to it. On the other hand, your friend’s blog has only 5 incoming links, then your blog would be considered more important since 10 other pages cite it. Now all of you might be harassed by spam links and ads on websites, right? Would you say that all incoming links to a webpage are equally important? Some of them might even be pop up ads. This is why while ranking your blog based on incoming links; we also need to consider the Pagerank of each of the source page that point to your blog. The outcome of this is that a link from a page of higher Pagerank is more important than a link from a page with low Pagerank. Wait, this isn’t the end!
What if the source page has a large number of links? This obviously decreases the importance of each link. Finally, we thus arrive at the formula for calculating the Pagerank of a page: It is the sum of Pageranks of all pages pointing to it divided by the number of outgoing links each page has. In layman’s terms this means that if a page is cited from well known sources it contains better information & is worth viewing.
The Pagerank is a probability distribution that represents the chances a person randomly clicking on any link will arrive at a page. It is a value between 0 and 1. A probability/Pagerank of 0.7 means that there is a 70% chance of a random surfer reaching to that page. There are many other factors that Google considers a trade secret and would never divulge. But starting from the above formula, Google converts the probability into a figure between 1 and 10 and this is the Pagerank of a page that anyone can check with the help of the Google Toolbar or the Pagerank Extension in Chrome, but it isn’t the actual value that Google uses to judge results.
I bid adieu by listing the Pagerank of Google(10), Wikipedia(9), Yahoo(9), Facebook(8) and the blog your are currently reading (0)..I have a lot to catch up!

No comments:

Post a Comment