Aussie AI

Triple Axis Pruning

  • Last Updated 25 September, 2024
  • by David Spuler, Ph.D.

Structured pruning methods are often categorized according to the crosswise dimension of the model that they aim to reduce. Weights can be structurally pruned across the three major axes of the models: depth, width, and length.

  • Depth pruning. The weights are pruned by removing layers to make the model "shallower". Techniques include layer pruning, inference loop early exit, and "shallow decoder" Transformer architectures. Note that choosing the model meta-parameter of the number of layers via neural architecture search (NAS) is conceptually very similar to static layer pruning. Also, dynamic early exit with a decision condition based only on a fixed number of layers (e.g. always exit after 10 layers) is also effectively static layer pruning, but with wasted storage space for unused layers of weights.
  • Width pruning. The fanning out of incoming embeddings data across multiple attention heads or internal neural nodes is the "width" of the model. Width pruning is sometimes called "thinning" or "slimming" of the model (see slimmable networks). Width pruning strategies include: attention head pruning, filter pruning, channel pruning. Read more about: width pruning.
  • Length pruning. The third dimension of the model is actually the model size, which decides the fixed size of vectors (embeddings) that propagate through the width and depth of the model. Note that choosing the meta-parameters of embedding size and context window (e.g. via NAS) are conceptually similar to static length pruning. Length pruning strategies include token pruning and embeddings pruning. Also related is autoregression research. Of the three axes, length pruning has had the least research. Read more about: length pruning.

Note that "length" is mainly applicable to text transformers. In vision transformers, the third dimension is the image, or patches of the image.

Triple Axis Pruning Research

Research papers on "triple pruning" (see also dual pruning research):

3D CNN Model Pruning

This isn't really what is meant by "triple axis pruning", and it is limited to Convolutional neural networks (CNNs). It is a conceptually different type of "triple-dimensional pruning" but is similar in goals. CNNs can have 3D tensors, and these can be pruned in multiple dimensions. Papers include:

  • Yuxin Zhang, Huan Wang, Yang Luo, Lu Yu, Haoji Hu, Hangguan Shan, Tony Q. S. Quek, 2019, Three-dimensional convolutional neural network pruning with regularization-based method, 2019 IEEE International Conference on Image Processing (ICIP) https://ieeexplore.ieee.org/abstract/document/8803541/, https://arxiv.org/abs/1811.07555
  • J Guo, D Xu, W Ouyang, 2023, Multidimensional Pruning and Its Extension: A Unified Framework for Model Compression, IEEE Transactions on Neural Networks and Learning Systems (Early Access), https://ieeexplore.ieee.org/abstract/document/10130783/
  • S Xu, A Huang, L Chen, B Zhang, 2020, Convolutional neural network pruning: A survey, 2020 39th Chinese Control Conference (CCC), https://ieeexplore.ieee.org/abstract/document/9189610/ (A survey of CNN pruning along 3 other dimensions: pruning method, training strategy, and estimation criterion.)
  • J Guo, W Ouyang, D Xu, 2020, Multi-dimensional pruning: A unified framework for model compression, https://ieeexplore.ieee.org/document/9157552, https://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_Multi-Dimensional_Pruning_A_Unified_Framework_for_Model_Compression_CVPR_2020_paper.pdf
  • A Chavan, Z Shen, Z Liu, Z Liu, 2022, Vision transformer slimming: Multi-dimension searching in continuous optimization space, CVPR 2022, https://arxiv.org/abs/2201.00814, PDF: http://openaccess.thecvf.com/content/CVPR2022/papers/Chavan_Vision_Transformer_Slimming_Multi-Dimension_Searching_in_Continuous_Optimization_Space_CVPR_2022_paper.pdf, Code: https://github.com/Arnav0400/ViT-Slim (Multi-dimensional NAS.)

    More Research on Pruning Types

    AI Books from Aussie AI



    The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
    • Your brain is 50 times bigger than the best AI engines.
    • Truly intelligent AI will require more compute!
    • Another case of the bitter lesson?
    • Maybe it's the opposite of that: the sweetest lesson.

    Get your copy from Amazon: The Sweetest Lesson



    RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
    • Smarter RAG
    • Faster RAG
    • Cheaper RAG
    • Agentic RAG
    • RAG reasoning

    Get your copy from Amazon: RAG Optimization



    Generative AI in C++ Generative AI Applications book:
    • Deciding on your AI project
    • Planning for success and safety
    • Designs and LLM architectures
    • Expediting development
    • Implementation and deployment

    Get your copy from Amazon: Generative AI Applications



    Generative AI in C++ Generative AI programming book:
    • Generative AI coding in C++
    • Transformer engine speedups
    • LLM models
    • Phone and desktop AI
    • Code examples
    • Research citations

    Get your copy from Amazon: Generative AI in C++



    CUDA C++ Optimization CUDA C++ Optimization book:
    • Faster CUDA C++ kernels
    • Optimization tools & techniques
    • Compute optimization
    • Memory optimization

    Get your copy from Amazon: CUDA C++ Optimization



    CUDA C++ Optimization CUDA C++ Debugging book:
    • Debugging CUDA C++ kernels
    • Tools & techniques
    • Self-testing & reliability
    • Common GPU kernel bugs

    Get your copy from Amazon: CUDA C++ Debugging

    More AI Research

    Read more about: