Listwise定义

In ranking, the input is a set of objects, the output is a permutation of the objects, the model is a ranking function which maps a given input to an output.

In learning, the training data $S = {(x^{(i)}), y^{(i)})}_{i=1}^m$ is drawn i.i.d. according to an unknown but fixed joint probability distribution between input and output $ P(x, y)$.
for pointwise approach:
for pairwise approach:
for listwise approach:

Ideally we would minimize the expected 0 − 1 loss defined on the predicted list and the ground truth list.
Practically we instead manage to minimize an empirical surrogate loss with respect to the training data.
三种surrogate loss:

Cross Entropy Loss (ListNet, ICML 2007)
the listwise loss function is defined as cross entropy between two parameterized probability distributions of permutations; one is obtained from the predicted result and the other is from the ground truth.

Cosine Loss(RankCosine, IPM 2007)
the listwise loss function is defined on the basis of cosine similarity between two score vectors from the predicted result and the ground truth.
wiki: cosine similarity 
Likelihood loss (ListMLE, this paper)
评价Surrogate Loss Function

consistency
obtained ranking function can converge to the optimal one, when the training sample size goes to infinity. 
soundness
the loss can represent well the targeted learning problem. That is, an incorrect prediction should receive a larger penalty than a correct prediction, and the penalty should reflect the confidence of prediction. 
continuity, differentiability, and convexity

computational efficiency
Loss  Continuity  Differentiability  Convexity  Efficiency 

Cosine Loss (RankCosine)  √  √  X  O(n) 
Crossentropy loss (ListNet)  √  √  √  O(n·n!) 
Likelihood loss (ListMLE)  √  √  √  O(n) 

Previous
Learning to Rank with Nonsmooth Cost Functions 
Next
Adapting Boosting for Information Retrieval Measures