pyspark GBTRegressor 特征重要度 及排序
和随机森林类似,模型评估指标和特征重要度分析
训练好model 可用如下代码打印特征以及重要度排序
#打印特征索引及其重要度
features_important = model.featureImportances
print(features_important)
#获取各个特征在模型中的重要性并按照权重倒序打印
ks = list(features_important.indices)
vs = list(features_important.toArray())
features_important = tuple(features_important)
print(len(features_important))
name_index = train.schema["features"].metadata["ml_attr"]["attrs"]
index_im = zip(ks, vs)
names = []
idxs = []
fea_num = 0
for it in name_index[‘numeric‘]:
names.append(it[‘name‘])
idxs.append(it[‘idx‘])
fea_num += 1
print (fea_num)
d = zip(names, idxs)
p = zip(index_im, d)
kv = {}
for fir, sec in p:
kv[sec[0]] = fir[1]
fea_num += 1
print(len(kv))
print (sorted(kv.items(), key=lambda el: el[1], reverse=True))

![pyspark GBTRegressor 特征重要度 及排序
[编程语言教程]](https://www.zixueka.com/wp-content/uploads/2024/01/1706715727-8b2b221a1336ead.jpg)
