Aussie AI

Inference Frameworks

  • Last Updated 29 August, 2025
  • by David Spuler, Ph.D.

Inference frameworks are software platforms that take a model and execute it against requests from users. Many inference frameworks also provide training and fine-tuning capabilities, but not all do. Many frameworks have been open-sourced, but there are also many that remain proprietary, and there is much competition occurring in the space.

There is much overlap between the concept of a framework and a "deep learning compiler". And there is also overlap with companies that are offering "AI cloud hosting" services, including both new startups and the major cloud hosts (e.g. Amazon AWS, Microsoft Azure, and Google GCP), which typically include both training and inference features.

Software frameworks are only one part of the AI tech stack. Read more about inference optimization, training optimization, hardware accelerators, ML compilers, and our list of common and obscure AI optimization techniques.

List of Machine Learning Frameworks

Some of the many frameworks include:

  • TensorFlow, open-sourced by Google.
  • PyTorch
  • Torch
  • MXNet
  • HuggingFace Transformers
  • LangChain
  • GGML
  • Llama.cpp
  • Llvm
  • Caffe and Caffe2
  • Theano
  • RNN
  • Keras
  • Microsoft CNTK (Cognitive Toolkit)
  • Amazon ML
  • Google Cloud AutoML
  • Microsoft Azure (various)
  • SciKit-learn

Features of ML Frameworks

Some of the desirable features include:

Survey Papers on ML Software Frameworks

Papers that review or survey software frameworks:

General Research on ML Software Frameworks

Research papers about general issues or specific frameworks:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: