Rank Correlation: Kendall \(\tau\)



Rank Correlation

Kendall (1938) has introduced the rank correlation and it is used when comparing two ranked vectors of integers. Each vector of length \(N\) has a mean of \(\frac{N (N+1)}{2}\) and therefore when computing the standard correlation between two ranked vectors of the same length, there will be some loss in information for the observations with middle ranks about the mean. We therefore use the rank correlation, which is similar to the standard statistical correlation in that both have values between \(-1\) and \(1\), but the rank correlation takes care of the order of the ranks rather than dealing with the ranks as values. Consider a sequence of \(N\) integers and let \(\hat{B}(i)\) be the \(i\)-th element in the sequence. Then for every \(i \in \{1, 2, \dots, N-1\}\), I count the number of \(\hat{B}(j)\)'s that are greater than \(\hat{B}(i)\), \(\forall j \in \{i+1, i+2, \dots, N\}\). The counted number for each \(\hat{B}(i)\) is denoted by \(C(i)\). The rank correlation, \(\tau\), is then defined as
\begin{equation} \tau=\frac{4S-N(N-1)}{N(N-1)}, \end{equation} where \(\displaystyle{S=\sum_{i=1}^{N-1} C(i)}\).

An example is given below on how to compute the rank correlation between two vectors of length \(N=10\). Let$$ A=\{6, 3, 7, 8, 5, 1, 2, 4, 9, 10\}, \\ B=\{6, 2, 1, 7, 8, 3, 4, 5, 10, 9\}. $$
In order to compute the rank correlation, I need to rearrange the vectors such that one of the vectors, say \(A\), has an objective ascending values and rearrange the other vector \(B\) to correspond to the initial ranks in \(A\). This gives $$ \hat{A}=\{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10\},\\ \hat{B}=\{ 3, 4, 2, 5, 8, 6, 1, 7, 10, 9 \}. $$
Then for \(\hat{B}(i), i=\{1, 2, \dots, N-1\}\), I count \(C(i)\) as defined previously to get the following $$ C=\{ 7, 6, 6, 5, 2, 3, 3, 2, 0\}.$$
The first element in \(C\), \(C(1)\), has come from having seven integers to the right of \(\hat{B}(1)=3\) that are greater than three. Then \(S=34\) is the sum of integers in \(C\). Define \(\Sigma=2 S - \frac{N(N-1)}{2}=23\), then the rank correlation is computed as \(\tau=\frac{2 \Sigma}{N(N-1)}=0.511\).

Note that if \(A=B\), then \(\Sigma=1+2+\dots+N-1=\frac{N(N-1)}{2}\) and this gives a rank correlation of \(+1\). While if \(B\) was in a reverse order of \(A\), then \(S=0\), \(\Sigma=-\frac{N(N-1)}{2}\) and the rank of correlation is \(-1\).


References
  1. Kendall, M. G. (1938). A new measure of rank correlation. Biometrika 30(1-2): 81-93.

No comments:

Post a Comment