Boosting方法中的特征重要性

Posted by c cm on July 15, 2017

XGBOOST

source

   def get_score(self, fmap='', importance_type='weight'):
        """Get feature importance of each feature.
        Importance type can be defined as:
            'weight' - the number of times a feature is used to split the data across all trees.
            'gain' - the average gain of the feature when it is used in trees
            'cover' - the average coverage of the feature when it is used in trees
        Parameters
        ----------
        fmap: str (optional)
           The name of feature map file
        """
  • weight 在tree中用到的次数计数
  • gain 在tree中用到时的gain之和/在tree中用到的次数计数

LightGBM

source

    def feature_importance(self, importance_type='split'):
        """
        Get feature importances

        Parameters
        ----------
        importance_type : str, default "split"
            How the importance is calculated: "split" or "gain"
            "split" is the number of times a feature is used in a model
            "gain" is the total gain of splits which use the feature

        Returns
        -------
        result : array
            Array of feature importances.
        """

CATBoosting

source

1. Regular feature importance

参考

  • $feature_total_importance_j$ is the individual feature importance of the j-th feature.
  • $average_feature_importance$ is the average feature importance of the j-th feature in the i-th combinational feature.

2. Internal feature importance

参考