Spearman Rank Correlation Coefficient

Written by Pranit Dhanade


Introduction

Spearman’s Rank Correlation Coefficient is a non-parametric statistical measure used to determine the strength and direction of a monotonic relationship between two variables.

Unlike Pearson correlation, which measures linear relationships using raw numerical observations, Spearman correlation operates on ranked data.

$$ r_s = 1 - \frac{ 6\sum d_i^2 }{ n(n^2-1) } $$

Where:


Conceptual Understanding

Spearman correlation evaluates whether the ordering of one variable matches the ordering of another variable.

If high values in one variable correspond to high values in another variable, then:

$$ r_s \rightarrow +1 $$

If high values correspond to low values:

$$ r_s \rightarrow -1 $$

Derivation of the Formula

Step 1: Pearson Correlation Formula

Spearman correlation is derived from Pearson correlation by replacing observations with ranks.

$$ r = \frac{ \sum (x_i-\bar{x})(y_i-\bar{y}) }{ \sqrt{ \sum (x_i-\bar{x})^2 \sum (y_i-\bar{y})^2 } } $$

Let:

Step 2: Mean of Ranks

The mean rank from $1$ to $n$ is:

$$ \bar{R} = \bar{S} = \frac{n+1}{2} $$

Step 3: Sum of Squares

The sum of squares of first $n$ natural numbers is:

$$ 1^2 + 2^2 + 3^2 + \cdots + n^2 = \frac{ n(n+1)(2n+1) }{6} $$

Therefore:

$$ \sum (R_i-\bar{R})^2 = \frac{ n(n^2-1) }{12} $$

Step 4: Rank Difference Relation

Since:

$$ d_i = R_i - S_i $$

Squaring both sides:

$$ d_i^2 = (R_i-S_i)^2 $$

Expanding:

$$ d_i^2 = R_i^2 + S_i^2 - 2R_iS_i $$

Rearranging and substituting into Pearson’s formula gives:

$$ r_s = 1 - \frac{ 6\sum d_i^2 }{ n(n^2-1) } $$

Hence proved.


Statistical Properties

Property Description
Non-parametric No normality assumption
Rank-based Uses ordinal information
Robust Less sensitive to outliers
Monotonic Measures monotonic dependence

Applications

1. Machine Learning

Used in feature selection, ranking systems, recommendation engines, and evaluation metrics.

2. Bioinformatics

Applied in genomic sequencing, gene expression analysis, and biological ranking problems.

3. Finance

Used for stock ranking, risk analysis, and ordinal economic modeling.

4. Psychology

Used for Likert-scale analysis, behavioral rankings, and survey statistics.


Advantages


Limitations


Spearman vs Pearson Correlation

Feature Pearson Spearman
Relationship Linear Monotonic
Uses Raw Data Yes No
Outlier Sensitivity High Lower
Normality Assumption Required Not Required

Conclusion

Spearman’s Rank Correlation Coefficient is one of the most important non-parametric statistical tools for measuring monotonic dependence between variables.

Its robustness, computational simplicity, and applicability to ranked and nonlinear data make it highly valuable in Machine Learning, Statistics, Computational Biology, Finance, and Data Science.