1. FTRL
$\textbf{g}t$ 第t个样本向量
$g{t, i}$ 第t个样本向量的第i个变量
$\textbf{g}_{1:t} = \sum_1^t \textbf{g}_i$
OGD(Online Gradient Descent):
其中$\eta$为学习速率。
缺点为稀疏性不足。
FTRL:
其中$\sigma$为学习率,$\sigma_{1:t} = 1/\eta_t$
2. Per-coordinate learning rate
3. Saving memory at massive scale
- Probablistic Feature Inclusion
new features in the model probabilistically as they first occur- Poisson Inclusion
- Bloom Filter Inclusion
- Encoding Values with Fewer Bits
eg. 分析数据后,用 q2.13 encoding
需要解决roundoff问题,eg randomized rounding strategy
其中R是0-1之间均匀分布的deviation - Training Many Similar Models
- A Single Value Structure
- Compute Learning Rate with Counts
- Subsampling Training Data
4. Evaluating Model Performance
- Progressive Validation(online loss)
- Deep Understanding through Visualization
GridVis
5. Confidence Estimates
do not fit standard confidence interval assumptions
uncertainty score calculated using feature learning rate.
6. Calibrating Predictions
- use Poisson regression to learn $\tau(p) = \gamma p^k$
- use piecewise linear/constant function
7. Automated Feature Management
ref
[24] Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization