Spearman Rank Correlation Coefficient

Written by Pranit Dhanade

Introduction

Spearman’s Rank Correlation Coefficient is a non-parametric statistical measure used to determine the strength and direction of a monotonic relationship between two variables.

Unlike Pearson correlation, which measures linear relationships using raw numerical observations, Spearman correlation operates on ranked data.

$$ r_s = 1 - \frac{ 6\sum d_i^2 }{ n(n^2-1) } $$

Where:

$r_s$ = Spearman Rank Correlation Coefficient
$d_i$ = Difference between ranks
$n$ = Number of observations

Conceptual Understanding

Spearman correlation evaluates whether the ordering of one variable matches the ordering of another variable.

If high values in one variable correspond to high values in another variable, then:

$$ r_s \rightarrow +1 $$

If high values correspond to low values:

$$ r_s \rightarrow -1 $$

Derivation of the Formula

Step 1: Pearson Correlation Formula

Spearman correlation is derived from Pearson correlation by replacing observations with ranks.

$$ r = \frac{ \sum (x_i-\bar{x})(y_i-\bar{y}) }{ \sqrt{ \sum (x_i-\bar{x})^2 \sum (y_i-\bar{y})^2 } } $$

Let:

$R_i$ = Rank of $X_i$
$S_i$ = Rank of $Y_i$
$d_i = R_i - S_i$

Step 2: Mean of Ranks

The mean rank from $1$ to $n$ is:

$$ \bar{R} = \bar{S} = \frac{n+1}{2} $$

Step 3: Sum of Squares

The sum of squares of first $n$ natural numbers is:

$$ 1^2 + 2^2 + 3^2 + \cdots + n^2 = \frac{ n(n+1)(2n+1) }{6} $$

Therefore:

$$ \sum (R_i-\bar{R})^2 = \frac{ n(n^2-1) }{12} $$

Step 4: Rank Difference Relation

Since:

$$ d_i = R_i - S_i $$

Squaring both sides:

$$ d_i^2 = (R_i-S_i)^2 $$

Expanding:

$$ d_i^2 = R_i^2 + S_i^2 - 2R_iS_i $$

Rearranging and substituting into Pearson’s formula gives:

$$ r_s = 1 - \frac{ 6\sum d_i^2 }{ n(n^2-1) } $$

Hence proved.

Statistical Properties

Property	Description
Non-parametric	No normality assumption
Rank-based	Uses ordinal information
Robust	Less sensitive to outliers
Monotonic	Measures monotonic dependence

Applications

1. Machine Learning

Used in feature selection, ranking systems, recommendation engines, and evaluation metrics.

2. Bioinformatics

Applied in genomic sequencing, gene expression analysis, and biological ranking problems.

3. Finance

Used for stock ranking, risk analysis, and ordinal economic modeling.

4. Psychology

Used for Likert-scale analysis, behavioral rankings, and survey statistics.

Advantages

Works with nonlinear monotonic data
Robust against outliers
Suitable for ordinal variables
No strict distribution assumptions

Limitations

Cannot detect non-monotonic relationships
Tied ranks reduce precision
Less suitable for strict linear modeling

Spearman vs Pearson Correlation

Feature	Pearson	Spearman
Relationship	Linear	Monotonic
Uses Raw Data	Yes	No
Outlier Sensitivity	High	Lower
Normality Assumption	Required	Not Required

Conclusion

Spearman’s Rank Correlation Coefficient is one of the most important non-parametric statistical tools for measuring monotonic dependence between variables.

Its robustness, computational simplicity, and applicability to ranked and nonlinear data make it highly valuable in Machine Learning, Statistics, Computational Biology, Finance, and Data Science.