Ad Click Prediction

Posted by c cm on February 22, 2017

1. FTRL

$\textbf{g}t$ 第t个样本向量
$g
{t, i}$ 第t个样本向量的第i个变量
$\textbf{g}_{1:t} = \sum_1^t \textbf{g}_i$

OGD(Online Gradient Descent):

其中$\eta$为学习速率。
缺点为稀疏性不足。

FTRL:

其中$\sigma$为学习率,$\sigma_{1:t} = 1/\eta_t$

2. Per-coordinate learning rate

3. Saving memory at massive scale

  • Probablistic Feature Inclusion
    new features in the model probabilistically as they first occur
    1. Poisson Inclusion
    2. Bloom Filter Inclusion
  • Encoding Values with Fewer Bits
    eg. 分析数据后,用 q2.13 encoding
    需要解决roundoff问题,eg randomized rounding strategy

    其中R是0-1之间均匀分布的deviation
  • Training Many Similar Models
  • A Single Value Structure
  • Compute Learning Rate with Counts
  • Subsampling Training Data

4. Evaluating Model Performance

  • Progressive Validation(online loss)
  • Deep Understanding through Visualization
    GridVis

5. Confidence Estimates

do not fit standard confidence interval assumptions
uncertainty score calculated using feature learning rate.

6. Calibrating Predictions

  • use Poisson regression to learn $\tau(p) = \gamma p^k$
  • use piecewise linear/constant function

7. Automated Feature Management

ref
[24] Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization