Aussie AI

Neural Architecture Search

  • Last Updated 26 August, 2025
  • by David Spuler, Ph.D.

Neural Architecture Search (NAS) is the very fancy way that AI researchers say things like this: how big should I make the model? How many weights? How many layers? What vocabulary?

Why NAS?

Choosing these numbers is actually a very hard problem. In the early days, these choices were done either randomly or by trial-and-error, which is expensive when you're talking about GPUs. If you go too large, then the model is over-parameterized and unnecessarily expensive. Go too low, and the model won't be very accurate, or might not even work at all. Hence, a large body of research on "NAS" has developed about systematic ways to find optimal sizes of the models on the various dimensions.

The biggest number is how many billions of weights the model should use, but this is actually dependent on a number of other numeric sizes. These weights are called "parameters" and the various other sizes are called "hyper-parameters" of the model, so NAS is also sometimes called "Hyper-Parameter Optimization" (HPO). The sizes and dimensions of models that NAS aims to determine includes:

  • Number of layers
  • Embedding size
  • Vocabulary size
  • Number of attention heads
  • Context size

NAS versus Model Compression

There are some parallels between neural architecture search and model compression, especially structural pruning. NAS aims to select the model hyperparamters before or during training, whereas model compression comes in afterwards and changes the model. Some types of pruning are very similar to NAS outcomes, such as:

As an example, any type of layer pruning is very similar to NAS choosing the number of layers. If you train your model, choosing a layer number via NAS, and then subsequently layer prune away some of those layers, that's the same as NAS choosing a smaller number of layers. Of course, that's only true for static layer pruning, whereas dynamic layer pruning such as early exiting has other runtime effects.

Survey Papers on NAS

Some of the review and survey papers on NAS:

General Research Papers on NAS

Some of the research papers on NAS:

This is not the full list of papers, I add with reasonable certainty, given that one survey paper stated there have been over 1,000 papers written on NAS since 2021. If this is your chosen dissertation topic, better start writing that lit review section early!

NAS and Dynamic Inference Optimization

Dynamic NAS is not yet a mainstream use of NAS searching. NAS has traditionally been applied to finding models without regard to dynamic approaches. An emerging area of research is to consider the hyperparameters of dynamic inference optimizations as part of searching the problem space for an optimal model.

Research papers on "dynamic NAS" include:

  • Matteo Gambella, Manuel Roveri, "EDANAS: Adaptive Neural Architecture Search for Early Exit Neural Networks", 2023 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2023. https://ieeexplore.ieee.org/document/10191876 (NAS applied to early-exit dynamic inference.)
  • Chakkrit Termritthikun, Yeshi Jamtsho, Jirarat Ieamsaard, Paisarn Muneesawang, Ivan Lee, 2021, EEEA-Net: An Early Exit Evolutionary Neural Architecture Search, Engineering Applications of Artificial Intelligence Volume 104, September 2021, 104397, https://www.sciencedirect.com/science/article/abs/pii/S0952197621002451, https://arxiv.org/abs/2108.06156, Code: https://github.com/chakkritte/EEEA-Net (A 2021 paper on NAS applied to early-exit.)
  • KT Chitty-Venkata, Y Bian, M Emani, V Vishwanath, Jan 2023 Differentiable Neural Architecture, Mixed Precision and Accelerator Co-search, IEEE Access, DOI:10.1109/ACCESS.2023.3320133, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10266308
  • Linnan Wang, Chenhan Yu, Satish Salian, Slawomir Kierat, Szymon Migacz, Alex Fit Florea, 2022, GPUNet: Searching the Deployable Convolution Neural Networks for GPUs, https://arxiv.org/abs/2205.00841 (A general NAS system that could be applied statically or dynamically.)
  • Matteo Gambella, Jary Pomponi, Simone Scardapane, Manuel Roveri, 24 Jan 2024, NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks, https://arxiv.org/abs/2401.13330
  • David Spuler, March 2024, Chapter 56. Neural Architecture Search, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
  • Angie Boggust, Venkatesh Sivaraman, Yannick Assogba, Donghao Ren, Dominik Moritz, Fred Hohman, 6 Aug 2024, Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments, https://arxiv.org/abs/2408.03274
  • Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 9 Aug 2024 (v2), A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2024.3447085, https://arxiv.org/abs/2308.06767 https://ieeexplore.ieee.org/abstract/document/10643325
  • Akhiad Bercovich, Tomer Ronen, Talor Abramovich, Nir Ailon, Nave Assaf, Mohammad Dabbah, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Netanel Haber, Ehud Karpas, Itay Levy, Shahar Mor, Zach Moshe, Najeeb Nabwani, Omri Puny, Ran Rubin, Itamar Schen, Ido Shahaf, Oren Tropp, Omer Ullman Argov, Ran Zilberstein, Ran El-Yaniv, 28 Nov 2024, Puzzle: Distillation-Based NAS for Inference-Optimized LLMs,NVIDIA Research, https://arxiv.org/abs/2411.19146 (This is dynamic NAS on a vast scale in a search space of size 10^138, because the optimization is applied with low granularity to each block in attention and FFN subcomponents of each layer.)
  • Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli and Michael Poli, Liquid AI, December 2, 2024, Automated Architecture Synthesis via Targeted Evolution, https://arxiv.org/abs/2411.17800 https://www.liquid.ai/research/automated-architecture-synthesis-via-targeted-evolution
  • Shaibal Saha, Lanyu Xu, 26 Feb 2025, Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies, https://arxiv.org/abs/2503.02891

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: