# Learning to Rank using Gradient Descent

Posted by c cm on June 26, 2016

## Probabilistic Ranking Cost Function

$P_{ij} \equiv \frac{e^{o_{ij}}}{1 + e^{o_{ij}}}$则cross entrophy cost function为
$C_{ij} \equiv C(o_{ij}) = - \hat{P}_{ij}logP_{ij} - (1 - \hat{P}_{ij})log(1-P_{ij}) = - \hat{P}_{ij}o_{ij} + log(1 + e^{o_{ij}})$

p.3: Theorem:

Given a sample set $x_i, i = 1, . . . , m$ and any permutation $Q$ of the consecutive integers ${1, 2, . . . , m}$, suppose that an arbitrary target posterior $0 ≤ \hat P_{kj} ≤ 1$ is specified for every adjacent pair $k = Q(i),j = Q(i+1), i = 1,…,m−1$.

Denote the set of such $\hat P$’s, for a given choice of $Q$, a set of ’adjacency posteriors’.

Then specifying any set of adjacency posteriors is necessary and sufficient to uniquely identify a target posterior $0 ≤ \hat P_{ij} ≤ 1$ for every pair of samples $x_i, x_j$.

## RankNet

$\delta\alpha_k = -\eta_k\frac{\partial f}{\partial\alpha_k}$

$o_i = g^3(\sum_j w_{ij}^{32}g^2(\sum_k w_{jk}^{21} + b_j^2) + b_i^3) \equiv g_i^3$

$\frac{\partial f}{\partial b_i^3} = \frac{\partial f}{\partial o_i}g_i^{\prime 3} \equiv \triangle_i^3 \\ \frac{\partial f}{\partial w_{in}^{32}} = \triangle_i^3 g_n^2 \\ \frac{\partial f}{\partial b_m^2} = g_m^{\prime 2}(\sum_i\triangle_i^3 w_{im}^{32})\equiv \triangle_m^2\\ \frac{\partial f}{\partial w_{mn}^{21}} = x_n\triangle_m^2$

$\frac{\partial f}{\partial b^3} = f^\prime (g_2^{\prime 3} - g_1^{\prime 3}) \equiv \triangle_2^3 - \triangle_1^3 \\ \frac{\partial f}{\partial w_{m}^{32}} = \triangle_2^3 g_{2m}^2 - \triangle_1^3 g_{1m}^2\\ \frac{\partial f}{\partial b_m^2} = \triangle_2^3 w_{m}^{32}g_{2m}^{\prime 2} - \triangle_1^3 w_{m}^{32}g_{1m}^{\prime 2}\\ \frac{\partial f}{\partial w_{mn}^{21}} = \triangle_{2m}^2 g_{2n}^1 - \triangle_{1m}^2 g_{1n}^1$

## 优点:

1. 可以训练不consistant的数据, 如rank a > rank b, rank b > rank c, rank c > rank a
2. cost在$o_{ij} = 0$时取到最小值,且在离0点较远的地方渐进于线性,更加robust
3. certain -> more certain