![]() |
![]() |
![]() |
![]() |
◊ The Math Behind PageRank ◊Have you ever wondered why even though your site is among the most popular websites of your category, it can never reach the first few pages of the Google search results, while sites with less content with less visitors often do? It is all based upon Google's innovative search algorithm.Sergey Brin and Larry Page were graduate students at Stanford when they founded Google. Ever since the company became a search giant, Google has been a hot topic. It's only fitting that a lecture at Stanford would explain the math behind PageRank. http://math.stanford.edu/~yiannis/math51/PageRank.pdf The lecture is nicely laid out, describing the steps used to arrive at the current PageRank algorithm: 1. Page links Google's success as a search engine is due to its skill at placing more interesting pages at the top of your search results. Instead of depending only on keywords, the ranking of pages is based upon how many other sites are linking to those sites. Of course, this method of ranking can be exploited. Notice that when you type in "miserable failure" on Google, the first page that appears is the official website of George W. Bush. This is an example of "google bombing," where users connect links to certain phrases; in this case, they connect the page of Bush's biography to the phrase "miserable failure." This concept explains why online quizzes and fanlistings are often among the first few pages of a google search result. A lot of people link back to these quizzes and fanlistings in their blogs and websites, and when google's crawler picks up on the numerous links back to the quiz, and it rises in Google's PageRank system. At the most basic level, Google records a giant square matrix full of zeroes where the sites it comes across are labels for both rows and columns. When a site links to another, it puts a 1 in the appropriate matrix cell. Multiplying this matrix by a vector filled with 1s yields a vector containing all the ranks of different sites. 2. Weighted recommendations A short list of recommendations is more valuable than a long list, so Google takes this into account. Essentially, each web page gets one vote for the sites it wants to recommend, split up evenly among those links. Every column of the matrix is divided by the sum of that column, so for a site with three links, each link is given 1/3 of a recommendation. Again, multiplying by a vector of 1s outputs a vector of site rankings. 3. Weighted recommendations, part 2 Wouldn't it be nice if important sites had more effective votes? Google improves on the previous method by weighting the votes of each site by the importance of the site. Unfortunately, this creates a circular algorithm: in order to calculate the ranking of a site, you need the ranking of the site to weigh its votes! This is solved with an interesting math concept known as an eigenvector. Essentially, the equation v = Pv where v is a vector and P is a matrix has a solution, and solving for v once again gives a vector containing the page rankings. 4. Teleportation The issue with the current method is if there are any one-way links between sites, any site that cannot be reached from every other site in the web gets a ranking of zero. Google adds in a teleportation factor to overcome this. 85% of the time, links are followed normally, but in other cases no links are followed and the algorithm 'teleports' or randomly selects any other page Google's database. This change actually simplifies the calculation of the eigenvector v. |
![]() |