BulletTech官方微信

V2

2022/01/22阅读：121主题：自定义主题1

# 特征选择之Permutation Importance

## 2 算法解构

Permutation Importance适用于表格型数据，其对于特征重要性的评判取决于该特征被随机重排后，模型表现评分的下降程度。其数学表达式可以表示为：

• 输入：训练后的模型m，训练集（或验证集，或测试集）D
• 模型m在数据集D上的性能评分s
• 对于数据集D的每一个特征j
• 对于K次重复实验中的每一次迭代k，随机重排列特征j，构造一个被污染的数据集D_c_{k,j}
• 计算模型m在数据集D_c_{k,j}上的性能评分s_{k,j}
• 特征j的重要性分数i_{j}则可以记作

## 3 示例代码

from sklearn.datasets import load_diabetesfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import Ridgediabetes = load_diabetes()X_train, X_val, y_train, y_val = train_test_split(    diabetes.data, diabetes.target, random_state=0)model = Ridge(alpha=1e-2).fit(X_train, y_train)model.score(X_val, y_val)scoring = ['r2', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error']# scoring参数可以同时加入多个计算指标，这样比重复使用permutation_importance更有效率，因为预测值能被用来计算不同的指标r_multi = permutation_importance(model, X_val, y_val, n_repeats=30, random_state=0, scoring=scoring)for metric in r_multi:    print(f"{metric}")    r = r_multi[metric]    for i in r.importances_mean.argsort()[::-1]:        if r.importances_mean[i] - 2 * r.importances_std[i] > 0:            print(f"    {diabetes.feature_names[i]:<8}"                  f"{r.importances_mean[i]:.3f}"                  f" +/- {r.importances_std[i]:.3f}")

r2  s5      0.204 +/- 0.050  bmi     0.176 +/- 0.048  bp      0.088 +/- 0.033  sex     0.056 +/- 0.023neg_mean_absolute_percentage_error  s5      0.081 +/- 0.020  bmi     0.064 +/- 0.015  bp      0.029 +/- 0.010neg_mean_squared_error  s5      1013.903 +/- 246.460  bmi     872.694 +/- 240.296  bp      438.681 +/- 163.025  sex     277.382 +/- 115.126

## 4 总结

### 参考资料

[1]

Permutation Importance with Multicollinear or Correlated Features: https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-multicollinear-py

[2]

Permutation Importance vs Random Forest Feature Importance (MDI): https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-py

V2