Demystifying the PageRank and HITS Algorithms...
Or just enough linear algebra to master network-based ranking methods
There is a ton of information on the internet about how search engines work. One important aspect of of the final ranking in any given search is based on webpage ranking algorithms that run 'under the hood'. The hyperlink, network-based algorithms known as PageRank and HITS are the two most prominent examples of such methods. However, much of the information on these methods is either misleading or confusing, and really does require only a little bit of mathematics to understand well. The main idea is that the rank of any page is dependent on the number of backlinks (or in-links) to that page, and the importance or quality of the pages that contain those backlinks. Unfortunately, most explanations of how this is precisely computed uses the language of linear algebra and "eigenvalues", which often obscures the relatively simply ideas underlying them. Here is an attempt to make these algorithms more accessible by boiling down the math to the essential details.
Let G be a directed graph. We think of the nodes of G as webpages and the directed arcs as hyperlinks. Let A(G)=A be the adjacency matrix where A[i,j] = 1 if (i,j) is an arc, is 0 otherwise. The ith row of A is the (characteristic) vector representing all the out-neighbors of the ith node, and the jth column of A is the vector representing the in-neighbors of the jth node. The transpose matrix A' has the roles reversed, where the ith row of A' is the vector representing all the in-neighbors of the ith node, and the jth column of A' is the vector representing the out-neighbors of the jth node.
Matrix multiplication can give simple algebraic representations of graph structures. For example, the square of the adjacency matrix A^2 has as its [i,j]th entry the number of 2-hop paths from ith node to jth node. And the product A'A has as its [i,j]th entry the number of nodes that are common in-neighbors to both the ith and jth node. Note how A'A is a symmetric matrix, likewise AA' is also symmetric. Symmetric matrices have the property that the [i,j]th entry matches the [j,i]th entry for all pairs i,j.
Stochastic Matrices
Let us place a probability distribution on the out-neighborhood of each node – that is, we give a positive weight (between 0 and 1) to each arc, with the property that the sum of weights out of each node is exactly 1. We call such a matrix a stochastic matrix.
Or just enough linear algebra to master network-based ranking methods
There is a ton of information on the internet about how search engines work. One important aspect of of the final ranking in any given search is based on webpage ranking algorithms that run 'under the hood'. The hyperlink, network-based algorithms known as PageRank and HITS are the two most prominent examples of such methods. However, much of the information on these methods is either misleading or confusing, and really does require only a little bit of mathematics to understand well. The main idea is that the rank of any page is dependent on the number of backlinks (or in-links) to that page, and the importance or quality of the pages that contain those backlinks. Unfortunately, most explanations of how this is precisely computed uses the language of linear algebra and "eigenvalues", which often obscures the relatively simply ideas underlying them. Here is an attempt to make these algorithms more accessible by boiling down the math to the essential details.
Let G be a directed graph. We think of the nodes of G as webpages and the directed arcs as hyperlinks. Let A(G)=A be the adjacency matrix where A[i,j] = 1 if (i,j) is an arc, is 0 otherwise. The ith row of A is the (characteristic) vector representing all the out-neighbors of the ith node, and the jth column of A is the vector representing the in-neighbors of the jth node. The transpose matrix A' has the roles reversed, where the ith row of A' is the vector representing all the in-neighbors of the ith node, and the jth column of A' is the vector representing the out-neighbors of the jth node.
Matrix multiplication can give simple algebraic representations of graph structures. For example, the square of the adjacency matrix A^2 has as its [i,j]th entry the number of 2-hop paths from ith node to jth node. And the product A'A has as its [i,j]th entry the number of nodes that are common in-neighbors to both the ith and jth node. Note how A'A is a symmetric matrix, likewise AA' is also symmetric. Symmetric matrices have the property that the [i,j]th entry matches the [j,i]th entry for all pairs i,j.
Stochastic Matrices
Let us place a probability distribution on the out-neighborhood of each node – that is, we give a positive weight (between 0 and 1) to each arc, with the property that the sum of weights out of each node is exactly 1. We call such a matrix a stochastic matrix.
Figure 1. Stochastic matrix obtained from a digraph where a uniform distribution is used at every node.
Markov Chains and PageRank
Given a stochastic matrix P we consider each entry as a transition probability of a random walk, or equivalently, the transition probability matrix of the Markov chain. A stationary distribution of a Markov chain is a probability distribution d on the nodes, so that P'd =d, i.e., d is a fixed point under the action of the transition matrix. For such matrices we can apply the fundamental theorem of Markov chains, which says that, under certain conditions, stationary distributions always exist and are unique.
PageRank associates this unique stationary distribution is with a ranking of the nodes of a web-graph.
The proof of the existence of the stationary distribution or fixed point, relies on a result called the Perron-Frobenius Theorem for non-negative, irreducible matrices, which essentially states that any such Markov chains has a left eigenvector of P with eigenvalue 1. We can simplify matters significantly by considering the restriction to positive matrices, which indeed what PageRank does in practice.
The Perron-Frobenius Theorem for positive matrices:
Let A = [a(i,j)] be an n × n positive matrix: a(i,j) > 0 for 1 ≤ i, j ≤ n. Then there is a positive real number r, such that r is an eigenvalue of A and any other eigenvalue λ (possibly, complex) is strictly smaller than r in absolute value, |λ| < r. T
The proof of the existence of the stationary distribution or fixed point, relies on a result called the Perron-Frobenius Theorem for non-negative, irreducible matrices, which essentially states that any such Markov chains has a left eigenvector of P with eigenvalue 1. We can simplify matters significantly by considering the restriction to positive matrices, which indeed what PageRank does in practice.
The Perron-Frobenius Theorem for positive matrices:
Let A = [a(i,j)] be an n × n positive matrix: a(i,j) > 0 for 1 ≤ i, j ≤ n. Then there is a positive real number r, such that r is an eigenvalue of A and any other eigenvalue λ (possibly, complex) is strictly smaller than r in absolute value, |λ| < r. T
An Algorithm for PageRank
A simple consequence of the Perron-Frobenius Theorem for positive matrices gives us an algorithm for computing the PageRank (the unique stationary distribution) for graphs to which it applies. Namely, by simply powering up or iteratively updating a distribution vector, starting from an arbitrary initial state (usually taken as the uniform distribution). Each iteration of this algorithm takes time O(|V| + |E|), and will often converge very quickly.
Determinants and Eigenvalues of Matrices
A determinant is a function that associates a scalar value to a square matrix. The value of the determinant can tell you if the matrix has an inverse or not. A matrix with a nonzero determinant is called regular or an invertible matrix (and, we can calculate its inverse matrix). If the determinant is zero the matrix is called a singular matrix and it does not have an inverse.
Formally, a determinant is computed by a sum of products over all permutations of numbers 1 to n, and therefore has n! terms in the sum. Determinants over 2 by 2 matrices, that is where n = 2, have only two terms (a11 * a22) – (a12 * a21) and are simple to calculate by hand. For larger values n>2, the number of terms start to get out of hand.
All regular matrices can be transformed into singular matrices by subtracting common value from the diagonal entries of the matrix. The problem of transforming a regular matrix into a singular matrix is referred to as the eigenvalue problem. The Eigenvalue Problem is determined by subtracting c*I from A and setting the determinant to 0, and then solving for c. Given a solution c to the eigenvalue problem we can find a vector (a class of vectors) X such that AX=cX. Any such X is called the associated eigenvector for eigenvalue c.
One of the simplest methods for finding the largest magnitude eigenvalue and an associated eigenvector is called the Power Method. Given a matrix A and an initial vector b0 that is a linear combination (that is, a weighted sum) of eigenvectors, we can see that by repeatedly multiplying each vector bi by A, we are reinforcing (the direction) of the eigenvector with the largest magnitude eigenvalue (relative to the direction of the other eigenvectors in the sum). Hence, the power method is applicable in cases where the the following technical conditions hold:
1) A has an eigenvalue that is strictly greater in magnitude than its other eigenvalues.
2) The starting vector b0 has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue, i.e., b0 is not orthogonal to the eigenvector.
The Power Method for Principal Eigenvector
The power method algorithm starts with an initial vector b0, and is described by the following iteration to compute successive vectors:
2) The starting vector b0 has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue, i.e., b0 is not orthogonal to the eigenvector.
The Power Method for Principal Eigenvector
The power method algorithm starts with an initial vector b0, and is described by the following iteration to compute successive vectors:
for k = 0,1,2,3,...
b(k+1) = A(bk) / || A(bk)||
Convergence is guaranteed when above conditions 1) and 2) hold.
The HITS Algorithm for Ranking Webpages
The HITS algorithm is a conceptually simply link based ranking algorithm based on the power method, which posits two parameters for each webpage: an authority value a_i and a hub value h_i. Authority values are simply the sum of the in-neighbor hub values, and similarly the hub values are the sum of the out-neighbor authority values. So hubs and authority values are mutually dependent. In matrix form we have the following two matrix-vector equations:
a= A'h and h = Aa
Plugging back in for h and a leads to the following equations yielding an pair of eigenvector problems:
a= A'Aa and h = AA'h
a= A'Aa and h = AA'h
Looking at these equations we can see that we seek eigenvectors associated with eigenvalue 1 for each of the pair of matrices A'A and AA'. We cannot in general assume that such eigenvalue/eigenvector pairs exists, but we can find other eigenvectors associated with the largest eigenvalue using the normalized Power Method described above, so long as technical conditions are satisfied. To wit, as noted both A'A and AA' are positive, symmetric matrices. Such matrices have the property that all the eigenvalues are positive reals, and the associated eigenvectors form a basis, that is every vector is a linear combination of eigenvectors. Hence, we can be assured that the HITS algorithm will converge to the (normalized) sum of eigenvectors associated with the largest eigenvalue (for which the starting vectors a_0 and h_0 have non-zero component). We are in good shape if these vectors are real-valued, so that we can interpret the components as ranking values. The theory says that as long as the matrices in question are "irreducible" then we are in business.
b(k+1) = A(bk) / || A(bk)||
Convergence is guaranteed when above conditions 1) and 2) hold.
The HITS Algorithm for Ranking Webpages
The HITS algorithm is a conceptually simply link based ranking algorithm based on the power method, which posits two parameters for each webpage: an authority value a_i and a hub value h_i. Authority values are simply the sum of the in-neighbor hub values, and similarly the hub values are the sum of the out-neighbor authority values. So hubs and authority values are mutually dependent. In matrix form we have the following two matrix-vector equations:
a= A'h and h = Aa
Plugging back in for h and a leads to the following equations yielding an pair of eigenvector problems:
a= A'Aa and h = AA'h
a= A'Aa and h = AA'h
Looking at these equations we can see that we seek eigenvectors associated with eigenvalue 1 for each of the pair of matrices A'A and AA'. We cannot in general assume that such eigenvalue/eigenvector pairs exists, but we can find other eigenvectors associated with the largest eigenvalue using the normalized Power Method described above, so long as technical conditions are satisfied. To wit, as noted both A'A and AA' are positive, symmetric matrices. Such matrices have the property that all the eigenvalues are positive reals, and the associated eigenvectors form a basis, that is every vector is a linear combination of eigenvectors. Hence, we can be assured that the HITS algorithm will converge to the (normalized) sum of eigenvectors associated with the largest eigenvalue (for which the starting vectors a_0 and h_0 have non-zero component). We are in good shape if these vectors are real-valued, so that we can interpret the components as ranking values. The theory says that as long as the matrices in question are "irreducible" then we are in business.