How does the shared_experts work when using EPMoE in DeepSeekV2? #2725

ly338 · 2025-01-03T11:38:30Z

ly338
Jan 3, 2025

Is the shared_expert configured with TP when Using EPMoE in DeepSeekV2?
In the code, The all_reduce operation in DeepseekV2MLP is disabled with reduce_results flag set to False like below,
self.shared_experts = DeepseekV2MLP(
hidden_size=config.hidden_size,
intermediate_size=intermediate_size,
hidden_act=config.hidden_act,
quant_config=quant_config,
reduce_results=False,
)
So how to get the input data for EPMoE layer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the shared_experts work when using EPMoE in DeepSeekV2? #2725

{{title}}

Replies: 0 comments

Select a reply

How does the shared_experts work when using EPMoE in DeepSeekV2? #2725

ly338 Jan 3, 2025

Replies: 0 comments

ly338
Jan 3, 2025