Make money doing the work you believe in

Oh I see your point though. I think the move towards the smaller, fine-grained experts largely goes back to the DeepSeekMoE paper (arxiv.org/pdf/2401.06066) that found it to be beneficial.

But to your point, what would maybe not be a bad idea is a larger shared expert.

Feb 12
at
3:44 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.