Aussie AI
Hybrid MoE Dense FFN Architecture
-
Last Updated 19 June, 2026
-
by David Spuler, Ph.D.
What is Hybrid MoE Dense FFN Architecture?
A hybrid Mixture-of-Experts (MoE) with a dense FFN architecture consists of parallel execution of an MoE FFN architecture with a dense non-MoE FFN. The MoE operates its sparse experts, and the dense FFN runs like a classic FFN, and the results are combined at the end.
Isn't this just a shared expert?
Yes, they are similar since a shared expert is like a fixed always-run FFN, but no, there are differences:
- The dense FFN may have different dimensions to the other experts.
- The dense FFN is not included in the normal MoE gating mechanism (although arguably, neither is a shared expert).
- Different weightings of the combination of the MoE experts and the dense FFN at the end (possibly).
An example of this architecture is the Edge versions of the Gemma models. The idea is to run a slightly larger dense FFN, as well as the various MoE experts.
Hybrid MoE Dense FFN Architecture: Book Excerpts and Blog Articles
Free online book excerpts with full text chapters online and free PDF downloads, and the Aussie AI blog, including related articles:
- David Spuler, May 31st, 2026, Chapter 19. Mixture-of-Experts (MoE), in book LLM Inference Optimization: State-of-the-Art Research, Table of Contents: https://www.aussieai.com/book/llm-inference-optimization https://www.amazon.com/dp/B0H3FKR39T
Research on Hybrid MoE Dense FFN Architecture
Research papers include:
- Devansh, Apr 2026, Google’s Gemma 4 is Weirder than you Realize: The architecture matters more than the numbers. Here’s what Google actually built, https://machine-learning-made-simple.medium.com/googles-gemma-4-is-weirder-than-you-realize-17d00d95b0d5
- David Spuler, May 31st, 2026, Chapter 19. Mixture-of-Experts (MoE), in book LLM Inference Optimization: State-of-the-Art Research, Table of Contents: https://www.aussieai.com/book/llm-inference-optimization https://www.amazon.com/dp/B0H3FKR39T
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
|
C++ AVX Optimization: CPU SIMD Vectorization:
Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization |
|
C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
Get your copy from Amazon: C++ Ultra-Low Latency |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home