更新时间:2023-05-16 GMT+08:00
分享

gbdt编码模型应用

概述

利用训练好的gbdt分类模型对输入的特征进行离散化处理。对每棵树的叶子节点进行编码,预测的时候遍历到叶子节点对应位置的编码为1,该树其余节点的编码为0。该节点主要用于读取gbdt编码模型训练阶段保存的模型,并对数据进行离散化编码。

输入

参数

子参数

参数说明

inputs

dataframe

inputs为字典类型,dataframe为pyspark中的DataFrame类型对象,用于进行gbdt特征编码

输出

表1

参数

子参数

参数说明

outputs

output_port_1

指向一个pyspark的DataFrame类型对象,该对象为原始的输入数据

参数说明

表2

参数

是否必选

参数说明

默认值

model_saved_path

模型存储的位置,需要和对应的gbdt编码模型训练节点对应参数保持一致

""

样例

数据样本

label,age,count
1,20,23
0,19,33
0,21,24
1,7,24
0,11,43
1,32,12
0,21,43
1,32,45

配置流程

运行流程

参数设置

查看结果

gbdt编码后的特征为:encodedKVFeature

label,age,count,gbt_assembled_features,standard_scale_feature,label_index,encodedKVFeature
1,20,23,"[20.0,23.0]","[2.277363769238288,1.9139391581953824]",1.0,"3:1,4:1,11:1,12:1"
0,19,33,"[19.0,33.0]","[2.1634955807763734,2.7460866182803314]",0.0,"1:1,7:1,9:1,14:1"
0,21,24,"[21.0,24.0]","[2.391231957700202,1.9971539042038775]",0.0,"2:1,7:1,10:1,14:1"
1,7,24,"[7.0,24.0]","[0.7970773192334007,1.9971539042038775]",1.0,"3:1,5:1,11:1,15:1"
0,11,43,"[11.0,43.0]","[1.2525500730810584,3.57823407836528]",0.0,"1:1,7:1,9:1,14:1"
1,32,12,"[32.0,12.0]","[3.6437820307812605,0.9985769521019388]",1.0,"0:1,4:1,8:1,12:1"
0,21,43,"[21.0,43.0]","[2.391231957700202,3.57823407836528]",0.0,"1:1,7:1,9:1,14:1"
1,32,45,"[32.0,45.0]","[3.6437820307812605,3.74466357038227]",1.0,"0:1,6:1,8:1,13:1"

分享:

    相关文档

    相关产品