Aussie AI

Hybrid MoE Dense FFN Architecture

  • Last Updated 23 April, 2026
  • by David Spuler, Ph.D.

What is Hybrid MoE Dense FFN Architecture?

A hybrid Mixture-of-Experts (MoE) with a dense FFN architecture consists of parallel execution of an MoE FFN architecture with a dense non-MoE FFN. The MoE operates its sparse experts, and the dense FFN runs like a classic FFN, and the results are combined at the end.

Isn't this just a shared expert?

Yes, they are similar since a shared expert is like a fixed always-run FFN, but no, there are differences:

  • The dense FFN may have different dimensions to the other experts.
  • The dense FFN is not included in the normal MoE gating mechanism (although arguably, neither is a shared expert).
  • Different weightings of the combination of the MoE experts and the dense FFN at the end (possibly).

An example of this architecture is the Edge versions of the Gemma models. The idea is to run a slightly larger dense FFN, as well as the various MoE experts.

Research on Hybrid MoE Dense FFN Architecture

Research papers include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging



C++ AVX Optimization C++ AVX Optimization: CPU SIMD Vectorization:
  • Introduction to AVX SIMD intrinsics
  • Vectorization and horizontal reductions
  • Low latency tricks and branchless programming
  • Instruction-level parallelism and out-of-order execution
  • Loop unrolling & double loop unrolling

Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization



C++ Ultra-Low Latency C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
  • Low-level C++ efficiency techniques
  • C++ multithreading optimizations
  • AI LLM inference backend speedups
  • Low latency data structures
  • Multithreading optimizations
  • General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency

More AI Research Topics

Read more about: