Learn/Core Concept What is mixture of experts? Mixture of Experts (MoE) is an architecture that uses multiple specialised sub-networks (experts) with a gating mechanism that routes inputs to the most relevant experts for processing. Instead of activating the entire model, only a subset of experts handle each request, dramatically reducing computational cost whilst maintaining model capability. This approach appears in several tools from today's issue, like OpenMythos implementing sparse MoE and DeepGEMM optimising fused MoE operations. For devs, MoE enables running larger, more capable models on limited hardware by activating only the expertise needed for each specific task. QuantisationSparsity |