Aussie AI
Hybrid MoE Dense FFN Architecture
-
Last Updated 23 April, 2026
-
by David Spuler, Ph.D.
What is Hybrid MoE Dense FFN Architecture?
A hybrid Mixture-of-Experts (MoE) with a dense FFN architecture consists of parallel execution of an MoE FFN architecture with a dense non-MoE FFN. The MoE operates its sparse experts, and the dense FFN runs like a classic FFN, and the results are combined at the end.
Isn't this just a shared expert?
Yes, they are similar since a shared expert is like a fixed always-run FFN, but no, there are differences:
- The dense FFN may have different dimensions to the other experts.
- The dense FFN is not included in the normal MoE gating mechanism (although arguably, neither is a shared expert).
- Different weightings of the combination of the MoE experts and the dense FFN at the end (possibly).
An example of this architecture is the Edge versions of the Gemma models. The idea is to run a slightly larger dense FFN, as well as the various MoE experts.
Research on Hybrid MoE Dense FFN Architecture
Research papers include:
- Devansh, Apr 2026, Google’s Gemma 4 is Weirder than you Realize: The architecture matters more than the numbers. Here’s what Google actually built, https://machine-learning-made-simple.medium.com/googles-gemma-4-is-weirder-than-you-realize-17d00d95b0d5
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
|
C++ AVX Optimization: CPU SIMD Vectorization:
Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization |
|
C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
Get your copy from Amazon: C++ Ultra-Low Latency |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home