DBMTL构建了多个目标之间的贝叶斯网络,显式建模了多个目标之间可能存在的因果关系,通过对不同任务间的贝叶斯关系来同时优化场景中的多个指标。
底层的shared layer和specific layer是通过hard parameter sharing方式来人工配置的,而google的MMoE是基于soft parameter sharing来实现不同任务底层特征和网络共享,并在Youtube场景中取得了不错的效果。因此DBMTL同样支持将shared layer和specific layer模块替换成MMoE模块,即通过task gate的方式在多组expert参数中加权组合出对应task的feature。
在多模态(图像、视频、文本)推荐场景,DBMTL支持使用CMBF模型或UNITER模型作为底层的shared layer
,以便充分利用多模态特征,取到更好的推荐效果。
model_config {
model_class: "DBMTL"
feature_groups {
group_name: "all"
feature_names: "user_id"
feature_names: "cms_segid"
...
feature_names: "tag_brand_list"
wide_deep: DEEP
}
dbmtl {
bottom_dnn {
hidden_units: [1024, 512, 256]
}
task_towers {
tower_name: "ctr"
label_name: "clk"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128, 64, 32]
}
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "cvr"
label_name: "buy"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128, 64, 32]
}
relation_tower_names: ["ctr"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 5e-6
}
- model_class: 'DBMTL', 不需要修改
- feature_groups: 配置一个名为'all'的feature_group。
- dbmtl: dbmtl相关的参数
- experts
- expert_name
- dnn deep part的参数配置
- hidden_units: dnn每一层的channel数目,即神经元的数目
- task_towers 根据任务数配置task_towers
- tower_name
- dnn deep part的参数配置
- hidden_units: dnn每一层的channel数目,即神经元的数目
- 默认为二分类任务,即num_class默认为1,weight默认为1.0,loss_type默认为CLASSIFICATION,metrics_set为auc
- 注:label_fields需与task_towers一一对齐。
- embedding_regularization: 对embedding部分加regularization,防止overfit
- experts
model_config {
model_class: "DBMTL"
feature_groups {
group_name: "all"
feature_names: "user_id"
feature_names: "cms_segid"
...
feature_names: "tag_brand_list"
wide_deep: DEEP
}
dbmtl {
bottom_dnn {
hidden_units: [1024]
}
expert_dnn {
hidden_units: [256, 128, 64, 32]
}
num_expert: 8
task_towers {
tower_name: "ctr"
label_name: "clk"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128, 64, 32]
}
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "cvr"
label_name: "buy"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128, 64, 32]
}
relation_tower_names: ["ctr"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 5e-6
}
- dbmtl
- expert_dnn: MMOE的专家DNN配置
- hidden_units: dnn每一层的channel数目,即神经元的数目
- expert_num: 专家DNN的数目
- 其余与dbmtl一致
- expert_dnn: MMOE的专家DNN配置
DBMTL模型每个塔的输出名为:"logits_" / "probs_" / "y_" + tower_name 其中,logits/probs/y对应: sigmoid之前的值/概率/回归模型的预测值 DBMTL模型每个塔的指标为:指标名+ "_" + tower_name
多模态、多目标推荐模型
model_config: {
model_class: 'DBMTL'
feature_groups: {
group_name: 'image'
feature_names: 'embedding'
wide_deep: DEEP
}
feature_groups: {
group_name: 'general'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'gender'
feature_names: 'age'
feature_names: 'occupation'
feature_names: 'zip_id'
feature_names: 'movie_year_bin'
feature_names: 'score_year_diff'
feature_names: 'score_time'
wide_deep: DEEP
}
feature_groups: {
group_name: 'text'
feature_names: 'title'
feature_names: 'genres'
wide_deep: DEEP
}
dbmtl {
bottom_cmbf {
multi_head_num: 2
image_multi_head_num: 2
text_multi_head_num: 2
image_feature_patch_num: 8
image_head_size: 32
text_head_size: 8
image_self_attention_layer_num: 2
text_self_attention_layer_num: 2
cross_modal_layer_num: 3
image_cross_head_size: 8
text_cross_head_size: 16
max_position_embeddings: 16
use_token_type: true
}
task_towers {
tower_name: "classify"
label_name: "label"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
metrics_set: {
gauc {
uid_field: 'user_id'
}
}
dnn {
hidden_units: [256, 128, 64]
}
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "rating"
label_name: "rating"
loss_type: L2_LOSS
metrics_set: {
mean_squared_error {}
}
dnn {
hidden_units: [256, 128, 64]
}
relation_tower_names: ["classify"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 1e-6
}
- dbmtl
- bottom_cmbf: 跨模态融合模型CMBF的相关配置,请参考CMBF模型配置。
- image_feature_dim: 在单模态学习模块之前做图像特征维度调整,调整到该参数指定的维度
- multi_head_num: 跨模态融合模块中的 head 数量,默认为1
- image_multi_head_num: 图像单模态学习模块中的 head 数量,默认为1
- text_multi_head_num: 文本单模态学习模块中的 head 数量,默认为1
- image_head_size: 单模态学习模块中的图像tower,multi-headed self-attention的每个head的size
- text_head_size: 单模态学习模块中的文本tower,multi-headed self-attention的每个head的size
- image_feature_patch_num: [可选,默认值为1] 当只有一个image feature时生效,表示该图像特征是一个复合embedding,维度为
image_feature_patch_num * embedding_size
。 - image_self_attention_layer_num: 单模态学习模块中的图像tower,multi-headed self-attention的层数
- text_self_attention_layer_num: 单模态学习模块中的文本tower,multi-headed self-attention的层数
- cross_modal_layer_num: 跨模态融合模块的层数,建议设在1到5之间,默认为1
- image_cross_head_size: 跨模模态学习模块中的图像tower,multi-headed attention的每个head的size
- text_cross_head_size: 跨模模态学习模块中的文本tower,multi-headed attention的每个head的size
- attention_probs_dropout_prob: self/cross attention模块attention权重的dropout概率
- hidden_dropout_prob: multi-headed attention模块中FC layer的dropout概率
- use_token_type: bool,default is false;是否使用token type embedding区分不同的text sequence feature
- use_position_embeddings: bool, default is true;是否为文本序列添加位置编码
- max_position_embeddings: 文本序列的最大位置,当
use_position_embeddings
为true时,必须配置;并且必须大于或等于所有特征配置max_seq_len
的最大值 - text_seq_emb_dropout_prob: 文本序列embedding的dropout概率
- other_feature_dnn: [可选] 其他特征的MLP网络配置
- 其余与dbmtl一致
- bottom_cmbf: 跨模态融合模型CMBF的相关配置,请参考CMBF模型配置。
多模态、多目标推荐模型
model_config: {
model_class: 'DBMTL'
feature_groups: {
group_name: 'image'
feature_names: 'embedding'
wide_deep: DEEP
}
feature_groups: {
group_name: 'general'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'gender'
feature_names: 'age'
feature_names: 'occupation'
feature_names: 'zip_id'
feature_names: 'movie_year_bin'
wide_deep: DEEP
}
feature_groups: {
group_name: 'other'
feature_names: 'score_year_diff'
feature_names: 'score_time'
wide_deep: DEEP
}
feature_groups: {
group_name: 'text'
feature_names: 'title'
feature_names: 'genres'
wide_deep: DEEP
}
dbmtl {
bottom_uniter {
hidden_size: 512
num_attention_heads: 4
num_hidden_layers: 2
intermediate_size: 512
hidden_act: 'swish'
max_position_embeddings: 16
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0
other_feature_dnn: {
hidden_units: 256
hidden_units: 128
}
}
task_towers {
tower_name: "classify"
label_name: "label"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
metrics_set: {
gauc {
uid_field: 'user_id'
}
}
dnn {
hidden_units: [256, 128, 64]
}
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "rating"
label_name: "rating"
loss_type: L2_LOSS
metrics_set: {
mean_squared_error {}
}
dnn {
hidden_units: [256, 128, 64]
}
relation_tower_names: ["classify"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 1e-6
}
- dbmtl
-
bottom_uniter: 多模态融合模型UNITER的相关配置,请参考UNITER模型配置。
- num_attention_heads: 跨模态融合模块中的 head 数量
- hidden_size: transformer模块的隐层单元数,必须是
num_attention_heads
的整数倍。 - num_hidden_layers: transformer模块中的self-attention层数,建议设在1到5之间,默认为1
- intermediate_size: transformer模块中MLP层的神经元个数
- hidden_act: transformer模块中MLP层的激活函数,目前支持:'relu'、'swish'、'gelu'、'tanh'
- attention_probs_dropout_prob: self/cross attention模块attention权重的dropout概率
- hidden_dropout_prob: multi-headed attention模块中FC layer的dropout概率
- use_position_embeddings: bool, default is true;是否为文本序列添加位置编码
- max_position_embeddings: 文本序列的最大位置,当
use_position_embeddings
为true时,必须配置;并且必须大于或等于所有特征配置max_seq_len
的最大值 - other_feature_dnn: [可选] 其他特征的MLP网络配置
-
其余与dbmtl一致
-