Aussie AI

Edge Computing

  • Last Updated 30 August, 2025
  • by David Spuler, Ph.D.

Edge Computing is the name researchers use for running computations on various low-resource devices. The devices on the "edge" are "close" to the user, but "far away" from the bigger servers in the cloud. The goal is therefore to run machine learning code on these smaller devices. Examples of such edge devices include:

  • Smartphones (see AI Smartphones)
  • Desktops and laptops
  • Cars (e.g. autonomous self-driving cars)
  • Video cameras (e.g. security cameras)
  • Internet of Things (IoT) devices (e.g. industrial devices, refrigerators, network stations, etc.)

Running AI models on edge devices usually means inference only, because the small devices usually cannot support the cost of training in terms of processing power and/or storage. However, there is some research into "on-device training."

Many architectures that use edge computing involve multiple machines, with at least two being the edge device and a main server. Hence, much of the research into ensemble methods such as distributed inference is also relevant.

Survey Papers on Edge Computing

Research on Edge Computing

There are plenty of papers on edge computing to choose from:

  • Jonas Geiping, Tom Goldstein, Dec 2022, Cramming: Training a Language Model on a Single GPU in One Day, https://arxiv.org/abs/2212.14034 Code: https://github.com/JonasGeiping/cramming (Note: uses Pytorch nvFuser deep learning compiler, which seems to be deprecated now.)
  • Benj Edwards, March 14, 2023, You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi, Ars Technica, https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/
  • Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, Jan. 2020. doi:10.1109/TWC.2019.2946140, https://arxiv.org/abs/1910.05316
  • Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pages 296–308. Springer, 2020, https://arxiv.org/abs/2008.05124
  • Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, Jan. 2020. doi:10.1109/TWC.2019.2946140, https://arxiv.org/abs/1910.05316
  • Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pages 296–308. Springer, 2020, https://arxiv.org/abs/2008.05124
  • Tao Ge, Si-Qing Chen, and Furu Wei. 2022. EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10786– 10798, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics, https://arxiv.org/abs/2202.07959
  • Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2021. ProFormer: Towards On-Device LSH Projection Based Transformers. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2823– 2828, Online. Association for Computational Linguistics. https://arxiv.org/abs/2004.05801
  • F Manca, F Ratto, 2023, ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs arXiv preprint arXiv:2309.13321, https://arxiv.org/pdf/2309.13321.pdf (Approximation techniques applied to edge computing.)
  • Pierre-Emmanuel Novac, March 2023, MicroAI: Embedded Artificial Intelligence for Human Activity Recognition on Smart Glasses, Ph.D. Thesis, Artificial Intelligence. Université Côte d’Azur, https://theses.hal.science/tel-04049008/document (Quantization in smart glasses device.)
  • R Snytsar, Oct 2023, Accelerating Machine Learning Primitives on Commodity Hardware, arXiv preprint arXiv:2310.05218, https://arxiv.org/pdf/2310.05218.pdf (Uses the "sliding window" technique to optimize general matrix multiplication on edge devices.)
  • GY Lee, T Dam, MM Ferdaus, DP Poenar, VN Duong, Oct 2023, Unlocking the capabilities of explainable fewshot learning in remote sensing, https://arxiv.org/pdf/2310.08619.pdf
  • PyTorch Edge Team, October 17, 2023, PyTorch Edge: Enabling On-Device Inference Across Mobile and Edge Devices with ExecuTorch, https://pytorch.org/blog/pytorch-edge/
  • Junho Wohn, February 2024, Optimizing Deep Learning Model Inference using Efficient Model Partitioning on Edge Devices, Thesis for the Master of Science, Graduate School of Hanyang University, https://repository.hanyang.ac.kr/handle/20.500.11754/188388, PDF: https://hanyang.dcollection.net/public_resource/pdf/200000726139_20240331200233.pdf (Compiles models using the TVM deep learning compiler and then partitions them across multiple edge devices for collaborative edge inference.)
  • Zao Zhang, 23 May 2024, Design Efficient Deep Neural Networks with System Optimization, Ph.D. Thesis, School of Electrical and Information Engineering, Faculty of Engineering, The University of Sydney, Australia, PDF: https://ses.library.usyd.edu.au/bitstream/handle/2123/32642/zhang_z_thesis.pdf?sequence=1&isAllowed=y https://ses.library.usyd.edu.au/handle/2123/32642 https://hdl.handle.net/2123/32642
  • Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım, 16 May 2024, Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems, https://arxiv.org/abs/2405.10426
  • Md Fahim Faysal Khan, May 2024, Constraint Driven Multimodal Edge Intelligence, Ph.D. Thesis, Electrical Engineering and Computer Science, Pennsylvania State University, https://etda.libraries.psu.edu/files/final_submissions/29680 (Layer-specific quantization levels for mixed-precision quantization.)
  • Jeffrey Yu, Kartik Prabhu, Yonatan Urman, Robert M. Radway, Eric Han, Priyanka Raina, 27 April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, April 2024, Pages 5–21, https://doi.org/10.1145/3620666.3651368 https://dl.acm.org/doi/abs/10.1145/3620666.3651368
  • Jiwei HUANG, Fangzheng LIU, and Jianbin ZHANG, “Multi-dimensional QoS Evaluation and Optimization of Mobile Edge Computing for IoT: A Survey,” Chinese Journal of Electronics, vol. 33, no. 5, pp. 1–16, 2024 doi: 10.23919/cje.2023.00.264 shu https://cje.ejournal.org.cn/article/doi/10.23919/cje.2023.00.264 (Theory of benchmarking and evaluation of mobile edge computing.)
  • Mikail Yayla, 2024, A vision for edge AI: ROBUST BINARIZED NEURAL NETWORKS ON EMERGING RESOURCE-CONSTRAINED HARDWARE Ph.D. Dissertation, Technischen Universität Dortmund, Fakultät Informatik, Dortmund 2024, http://129.217.131.68:8080/bitstream/2003/42431/1/Dissertation_Yayla.pdf (Binarized networks with consideration of both software and hardware issues.)
  • Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni, 16 Apr 2024, Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration, https://arxiv.org/abs/2404.10733
  • Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
  • Seungtae Hong, Gunju Park, Jeong-Si Kim, 9 June 2024, Automated deep-learning model optimization framework for microcontrollers, https://doi.org/10.4218/etrij.2023-0522 https://onlinelibrary.wiley.com/doi/full/10.4218/etrij.2023-0522 (Framework for using quantization and pruning on microcontroller devices.)
  • Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen, 27 May 2024, Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference, https://arxiv.org/abs/2405.17245
  • Qualcomm, May 2023, The future of AI is hybrid, Qualcomm White Paper, https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/Whitepaper-The-future-of-AI-is-hybrid-Part-1-Unlocking-the-generative-AI-future-with-on-device-and-hybrid-AI.pdf
  • Guozhi Yan; Kai Liu; Chunhui Liu; Jie Zhang, 2024, Edge Intelligence for Internet of Vehicles: A Survey, IEEE Transactions on Consumer Electronics (Early Access), 18 March 2024, https://ieeexplore.ieee.org/abstract/document/10474509
  • Daniel Situnayake, 24 January 2023, AI at the Edge: Solving Real-World Problems with Embedded Machine Learning, O'Reilly Media, Inc, USA, https://www.amazon.com/dp/1098120205/
  • Jaskirat Singh, Bram Adams, Ahmed E. Hassan, 25 Mar 2024, On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance, https://arxiv.org/abs/2403.17154 (MLOps deployment for quantization, partitioning and early-exit across mobile, edge, and cloud platforms, including running early exit on mobile.)
  • Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
  • P Dong, L Lu, C Wu, C Lyu, G Yuan, H Tang, Y Wang, 2023, PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile, https://openreview.net/pdf?id=N56hAiQvot Code: https://github.com/PeiyanFlying/PackQViT
  • Bingkun Lai, Jinbo Wen, Jiawen Kang, Hongyang Du, Jiangtian Nie, Changyan Yi, Dong In Kim, Shengli Xie, 19 Dec 2023, Resource-efficient Generative Mobile Edge Networks in 6G Era: Fundamentals, Framework and Case Study, https://arxiv.org/abs/2312.12063
  • Mohammed Ayyat; Tamer Nadeem; Bartosz Krawczyk, Dec 2023, ClassyNet: Class-Aware Early Exit Neural Networks for Edge Devices, IEEE Internet of Things Journal (Early Access), https://ieeexplore.ieee.org/abstract/document/10365527
  • Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen, Dec 2023, PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU https://arxiv.org/abs/2312.12456 Code: https://github.com/SJTU-IPADS/PowerInfer
  • Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar, Dec 2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory Apple Research, https://arxiv.org/abs/2312.11514
  • X Li, S Chen, S Zhang, L Hou, Y Zhu, Z Xiao, 2023, Human Activity Recognition Using IR-UWB Radar: A Lightweight Transformer Approach, IEEE Geoscience and Remote Sensing Letters (Early Access), https://ieeexplore.ieee.org/document/10247554
  • Ali Rahmanian, Doctoral Thesis, April 2024, Edge Orchestration for Latency-Sensitive Applications, Department of Computing Science, Umea University, Sweden, https://www.diva-portal.org/smash/get/diva2:1849510/FULLTEXT02.pdf
  • Victor J.B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini, 3 Apr 2024, Optimizing the Deployment of Tiny Transformers on Low-Power MCUs, https://arxiv.org/abs/2404.02945 (Uses an approach called "Fused Weight Self-Attention" that fuses some of the QKV matrices and also tiling in multi-head attention, along with 8-bit integer quantization and integerized Softmax.)
  • MMH Shuvo, SK Islam, J Cheng, Efficient acceleration of deep learning inference on resource-constrained edge devices: A review, 2022, Proceedings of the IEEE ( Volume: 111, Issue: 1, January 2023), pp 42 - 91, 14 December 2022 , https://ieeexplore.ieee.org/abstract/document/9985008 PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008
  • Minghao Yan, Hongyi Wang, Shivaram Venkataraman, 9 Jan 2024 (v2), PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices, https://arxiv.org/abs/2310.19991 (Faster inference with a focus on pipelining and scheduling of hardware acceleration.)
  • 26 Feb 2024 (v2), From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges, Sai Krishna Revanth Vuruma, Ashley Margetts, Jianhai Su, Faez Ahmed, Biplav Srivastava, https://arxiv.org/abs/2402.12702
  • Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
  • Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
  • Rohit Sharma, 9 July 2022 Introduction to TinyML, Independently published, https://www.amazon.com/Introduction-TinyML-Rohit-Sharma/dp/B0B5Q281L9/
  • Semaphore, Dec 14, 2023, 6 Ways to Run LLMs Locally, https://semaphoreci.medium.com/6-ways-to-run-llms-locally-fa25be0797e5 (The six ways are HF Transformers, LangChain, Llama.cpp, Llamafile, Ollama, and GPT4All.)
  • Zhepeng Wang, Isaacshubhanand Putla, Weiwen Jiang, Youzuo Lin, Oct 2023, Edge-InversionNet: Enabling Efficient Inference of InversionNet on Edge Devices, https://arxiv.org/abs/2310.09667 (Using structured pruning via layerwise filter pruning to run a model on a Raspberry Pi.)
  • Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao, Nov 2023, TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices, https://arxiv.org/abs/2311.01759
  • Yuyi Mao, Xianghao Yu, Kaibin Huang, Ying-Jun Angela Zhang, Jun Zhang, Dec 2023, Green Edge AI: A Contemporary Survey, https://arxiv.org/abs/2312.00333
  • Murray Kornelsen, April 2023, Low-Latency BERT Inference for Heterogeneous Multi-Processor Edge Devices, Department of Electrical & Computer Engineering, McGill University, Canada https://escholarship.mcgill.ca/downloads/m326m732p
  • Yifeng Wu; Xu He; Lingfei Mo; Qing Wang, Jan 2024, A Self-Attention-Assisted TinyML With Effective Representation for UWB NLOS Identification, IEEE Internet of Things Journal (Early Access), https://ieeexplore.ieee.org/abstract/document/10380220
  • Ning Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang, 5 Jan 2024, Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence, https://arxiv.org/abs/2401.02668 (Covers processing on cloud and edge servers in various configurations with communication between nodes for both training/fine-tuning and inference tasks.)
  • C Gernigon, SI Filip, O Sentieys, C Coggiola, M Bruno, Oct 2023, Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing, https://hal.science/hal-04252197/document
  • Y Liang, Z Wang, X Xu, Y Tang, Z Jie, J Lu, Oct 2023, MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory, arXiv preprint arXiv:2310.16898, https://arxiv.org/pdf/2310.16898.pdf
  • MWU Rahman, MM Abrar, HG Copening, S Hariri, Oct 2023, Quantized Transformer Language Model Implementations on Edge Devices, https://arxiv.org/pdf/2310.03971.pdf (Uses a "FlatBuffer" format on TensorFlow-Lite.)
  • H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
  • PE Novac, G Boukli Hacene, A Pegatoquet, 2021, Quantization and deployment of deep neural networks on microcontrollers, Sensors, 2021, https://www.mdpi.com/1424-8220/21/9/2984
  • P Cruz, N Achir, AC Viana, 2022, On the edge of the deployment: A survey on multi-access edge computing https://dl.acm.org/doi/abs/10.1145/3529758 https://inria.hal.science/hal-03637105/file/ACM_MEC_Survey___Camera_Ready.pdf
  • W Yu, F Liang, X He, WG Hatcher, C Lu, J Lin, 2017, A survey on the edge computing for the Internet of Things, IEEE Access (Volume: 6), https://ieeexplore.ieee.org/abstract/document/8123913/ https://ieeexplore.ieee.org/iel7/6287639/8274985/08123913.pdf
  • R. Sanchez-Iborra and A. F. Skarmeta, Tinyml-enabled frugal smart objects: Challenges and opportunities, IEEE Circuits and Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020. https://ieeexplore.ieee.org/document/9166461 PDF: https://sci-hub.se/10.1109/MCAS.2020.3005467
  • R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/
  • S. Prakash, T. Callahan, J. Bushagour, C. Banbury, A. V. Green, P. Warden, T. Ansell, and V. J. Reddi, 2023, CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs, 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). pp. 157–167. https://ui.adsabs.harvard.edu/abs/2022arXiv220101863P/abstract
  • M. Giordano, L. Piccinelli, and M. Magno, Survey and comparison of milliwatts micro controllers for tiny machine learning at the edge, in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 94–97. https://ieeexplore.ieee.org/document/9870017
  • Md. Maruf Hossain Shuvo; Syed Kamrul Islam; Jianlin Cheng; Bashir I. Morshed, 2023, Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review, Proceedings of the IEEE (Volume 111, Issue 1, January 2023), https://ieeexplore.ieee.org/document/9985008 PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008 (Extensive 2023 survey of inference optimization in general and specifically on edge platforms.)
  • T Tambe, 2023, Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning, https://dash.harvard.edu/bitstream/handle/1/37375806/Final_Draft_PhD_Dissertation_Thierry_Tambe.pdf?sequence=1&isAllowed=y
  • Douglas C. Youvan , June 15, 2024, Developing and Deploying AI Applications on NVIDIA Jetson Orin NX: A Comprehensive Guide, https://www.researchgate.net/profile/Douglas-Youvan/publication/381434888_Developing_and_Deploying_AI_Applications_on_NVIDIA_Jetson_Orin_NX_A_Comprehensive_Guide/links/666d7390de777205a32fceb6/Developing-and-Deploying-AI-Applications-on-NVIDIA-Jetson-Orin-NX-A-Comprehensive-Guide.pdf
  • Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
  • Dan Peng, Zhihui Fu, Jun Wang, 1 Jul 2024, PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs, https://arxiv.org/abs/2407.01031 (Running fine-tuning on a smartphone via a low-memory optimization using a "derivative-free" "zeroth-order" technique called MeZo, with advantages such as privacy.)
  • Ying He, Jingcheng Fang, F. Richard Yu, Victor C. Leung, 2024, Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach, PrePrints pp. 1-12, DOI: 10.1109/TMC.2024.3415661, https://www.computer.org/csdl/journal/tm/5555/01/10591707/1YraFlDdKYo
  • Adarsh Prasad Behera, Paulius Daubaris, Iñaki Bravo, José Gallego, Roberto Morabito, Joerg Widmer, Jaya Prakash Varma Champati, 10 Jul 2024, Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical, https://arxiv.org/abs/2407.11061
  • Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
  • Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun, 3 Aug 2024, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 Code: https://github.com/OpenBMB/MiniCPM-V
  • Beom Jin Kang, Hae In Lee, Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim, October 2024, A survey of FPGA and ASIC designs for transformer inference acceleration and optimization, Journal of Systems Architecture, Volume 155, 103247, https://www.sciencedirect.com/science/article/abs/pii/S138376212400184X
  • R. Narmeen, P. Mach, Z. Becvar and I. Ahmad, 16 August 2024, Joint Exit Selection and Offloading Decision for Applications Based on Deep Neural Networks, IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3444898, https://doi.org/10.1109/JIOT.2024.3444898 https://ieeexplore.ieee.org/abstract/document/10638073
  • Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
  • Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
  • L. Cheng, Y. Gu, Q. Liu, L. Yang, C. Liu and Y. Wang, 2024, Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A Survey, in IEEE Transactions on Sustainable Computing, doi: 10.1109/TSUSC.2024.3353176. https://ieeexplore.ieee.org/abstract/document/10398463
  • Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
  • Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 1 Sep 2024, TinyAgent: Function Calling at the Edge, https://arxiv.org/abs/2409.00608 https://github.com/SqueezeAILab/TinyAgent
  • Tyler Mullen, August 22, 2024, Unlocking 7B+ language models in your browser: A deep dive with Google AI Edge's MediaPipe, https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
  • Othmane Friha, Mohamed Amine Ferrag, Burak Kantarci, Burak Cakmak, Arda Ozgun, Nassira Ghoualmi-Zine, 2024, LLM-based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10669603
  • Dimitrios Kafetzis, Iordanis Koutsopoulos, Oct 2024, Demo: AnExperimental Platform for AI Model Partitioning on Resource-constrained Devices, https://dl.acm.org/doi/pdf/10.1145/3641512.3690629
  • M. Sponner, L. Servadei, B. Waschneck, R. Wille and A. Kumar, "Harnessing Temporal Information for Efficient Edge AI," 2024 9th International Conference on Fog and Mobile Edge Computing (FMEC), Malmö, Sweden, 2024, pp. 5-13, doi: 10.1109/FMEC62297.2024.10710223. https://ieeexplore.ieee.org/abstract/document/10710223
  • Mistral AI, Oct 2024, Un Ministral, des Ministraux: Introducing the world’s best edge models. https://mistral.ai/news/ministraux/
  • Michael Nuñez, October 16, 2024, Mistral AI’s new language models bring AI power to your phone and laptop, https://venturebeat.com/business/mistral-ai-new-language-models-bring-ai-power-to-your-phone-and-laptop/
  • Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
  • Zebin Yang, Renze Chen, Taiqiang Wu, Ngai Wong, Yun Liang, Runsheng Wang, Ru Huang, Meng Li, 23 Oct 2024, MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers https://arxiv.org/abs/2410.17957
  • Arun Nanda, Sep 7, 2024, Reducing the Size of AI Models. Running large AI models on edge devices, https://towardsdatascience.com/reducing-the-size-of-ai-models-4ab4cfe5887a
  • Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
  • Justine, Apr 2023, Edge AI Just Got Faster, https://justine.lol/mmap/ (Loading models using mmap.)
  • Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Wenjun Zhang, Ping Zhang, 11 Nov 2024, WDMoE: Wireless Distributed Mixture of Experts for Large Language Models, https://arxiv.org/abs/2411.06681
  • Ibrahim Kok, Orhan Demirci, Suat Ozdemir, 20 Nov 2024, When IoT Meet LLMs: Applications and Challenges, https://arxiv.org/abs/2411.17722
  • M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
  • Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris, 5 Dec 2024, MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference, https://arxiv.org/abs/2412.04147
  • A. K. Al-Zihairy and A. E. Abdelkareem, "Optimizing YOLOv8-cls: A Step Towards Smarter Edge Environments," 2024 1st International Conference on Emerging Technologies for Dependable Internet of Things (ICETI), Sana'a, Yemen, 2024, pp. 1-6, doi: 10.1109/ICETI63946.2024.10777236. https://ieeexplore.ieee.org/abstract/document/10777236
  • Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
  • Liam Seymour, Basar Kutukcu, Sabur Baidya, 19 Dec 2024, Large Language Models on Small Resource-Constrained Systems: Performance Characterization, Analysis and Trade-offs, https://arxiv.org/abs/2412.15352 https://github.com/LiamS57/orin-llm-testing
  • D. Xu et al., "EdgeLLM: Fast On-device LLM Inference with Speculative Decoding" in IEEE Transactions on Mobile Computing, vol. , no. 01, pp. 1-18, PrePrints 5555, doi: 10.1109/TMC.2024.3513457. https://www.computer.org/csdl/journal/tm/5555/01/10812936/22UpTlf6X2U
  • S. Pareek, A. Saleh Al-Samalek, A. Alkhayyat, S. Singh, A. Singh and S. Dasi, "Efficient Vision Transformers for Edge Devices: Pruning and Quantization Approaches," 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 2024, pp. 1465-1471, doi: 10.1109/ICTACS62700.2024.10840584. https://ieeexplore.ieee.org/abstract/document/10840584
  • Jindong Li, Tenglong Li, Guobin Shen, Dongcheng Zhao, Qian Zhang, Yi Zeng, 15 Feb 2025, Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA, https://arxiv.org/abs/2502.10659
  • Xian Peng, Xin Wu, Lianming Xu, Li Wang, Aiguo Fei, 6 Feb 2025, DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices, https://arxiv.org/abs/2502.15735
  • Shaibal Saha, Lanyu Xu, 26 Feb 2025, Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies, https://arxiv.org/abs/2503.02891
  • Kangbo Bai, Le Ye, Ru Huang, Tianyu Jia, 16 May 2025, EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge, https://arxiv.org/abs/2505.10782
  • Jiyong Kim, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, and Jaehyun Park, 14 Aug 2025, eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing, https://arxiv.org/abs/2508.10370
  • Wei Fan, JinYi Yoon, Xiaochang Li, Huajie Shao, and Bo Ji, 23 Jul 2025, P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices, https://arxiv.org/abs/2507.17228
  • Radowanul Haque, Aftab Ali, Sally McClean and Naveed Khan, 22 Jul 2025, Explainable Vulnerability Detection in C/C++ Using Edge-Aware Graph Attention Networks, https://arxiv.org/abs/2507.16540
  • Seunghyeon Kim, Kyeongryeol Go, 22 Jul 2025, Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective, https://arxiv.org/abs/2507.16254
  • Zied Jenhani and Mounir Bensalem and Jasenka Dizdarevi\'c and Admela Jukan, 22 Jul 2025, An Experimental Study of Split-Learning TinyML on Ultra-Low-Power Edge/IoT Nodes, https://arxiv.org/abs/2507.16594
  • Arseniy Andreyev and Pierfrancesco Beneventano, 22 Jul 2025, Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD, https://arxiv.org/abs/2412.20553
  • Linshen Liu, Boyan Su, Junyue Jiang, Guanlin Wu, Cong Guo, Ceyu Xu, Hao Frank Yang, 22 Jul 2025, Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge, https://arxiv.org/abs/2507.04123
  • Yujia Tong, Jingling Yuan, Chuang Hu, 17 Jul 2025, Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction, https://arxiv.org/abs/2507.17768
  • Casper Br\"ocheler, Thomas Vroom, Derrick Timmermans, Alan van den Akker, Guangzhi Tang, Charalampos S. Kouzinopoulos, Rico M\"ockel, 18 Jul 2025, A segmented robot grasping perception neural network for edge AI, https://arxiv.org/abs/2507.13970
  • Shuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya, 18 Jul 2025, Edge Intelligence with Spiking Neural Networks, https://arxiv.org/abs/2507.14069
  • Sebastian A. Cruz Romero, Misael J. Mercado Hernandez, Samir Y. Ali Rivera, Jorge A. Santiago Fernandez, Wilfredo E. Lugo Beauchamp, 20 Jul 2025, Design of an Edge-based Portable EHR System for Anemia Screening in Remote Health Applications, https://arxiv.org/abs/2507.15146
  • Eugene Armah, Linda Amoako Bannning, 19 Jul 2025, Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning, https://arxiv.org/abs/2507.14597
  • Thai T. Vu and John Le, 20 Jul 2025, Quantum Machine Learning for Secure Cooperative Multi-Layer Edge AI with Proportional Fairness, https://arxiv.org/abs/2507.15145
  • Alon Beck, Noam Levi, Yohai Bar-Sinai, 19 Jul 2025, Grokking at the Edge of Linear Separability, https://arxiv.org/abs/2410.04489
  • Ananda Prakash Verma, 10 Aug 2025, EDGE: A Theoretical Framework for Misconception-Aware Adaptive Learning, https://arxiv.org/abs/2508.07224
  • Tuo Zhang, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative, https://arxiv.org/abs/2508.07329
  • Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen, 25 Jul 2025, DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference, https://arxiv.org/abs/2507.19608
  • Chengzhuo Han, 28 Jul 2025, Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems, https://arxiv.org/abs/2507.20444
  • Tianhao Wang, Simon Klancher, Kunal Mukherjee, Josh Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee, 28 Jul 2025, PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes, https://arxiv.org/abs/2507.20967
  • Yang Zhao, Shusheng Li, Xueshang Feng, 28 Jul 2025, Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit, https://arxiv.org/abs/2507.20623
  • Xingjian Zhang, Siwei Wen, Wenjun Wu, Lei Huang, 29 Jul 2025, EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity, https://arxiv.org/abs/2507.21848
  • Abir Ray, 28 Jul 2025, EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks, https://arxiv.org/abs/2507.21196
  • Ghazal Sobhani, Md. Monzurul Amin Ifath, Tushar Sharma, Israat Haque, 30 Jul 2025, On the Sustainability of AI Inferences in the Edge, https://arxiv.org/abs/2507.23093
  • Georg Slamanig, Francesco Corti, Olga Saukh, 31 Jul 2025, From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices, https://arxiv.org/abs/2507.23536
  • Jin Yang, Qiong Wu, Zhiying Feng, Zhi Zhou, Deke Guo and Xu Chen, 1 Aug 2025, Quality-of-Service Aware LLM Routing for Edge Computing with Multiple Experts, https://arxiv.org/abs/2508.00234
  • Jiyu Chen, Poh Seng Lim, Shuang Peng, Daxiong Luo, JungHau Foo, Yap Deep, Timothy Lee Jun Jie, Kelvin Teh Kae Wen, Fan Yang, Danyu Feng, Hao-Yun Chen, Peng-Wen Chen, Fangyuan Li, Xiaoxin Chen, Wong Wai Mun, 1 Aug 2025, EdgeInfinite-Instruct: Bridging SFT-Based Optimization and NPU-Level Efficiency for Edge Devices, https://arxiv.org/abs/2508.00370
  • Hangyu Li and Hongyue Wu and Guodong Fan and Zhen Zhang and Shizhan Chen and Zhiyong Feng, 1 Aug 2025, Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices, https://arxiv.org/abs/2506.20644
  • Fengze Yang, Bo Yu, Yang Zhou, Xuewen Luo, Zhengzhong Tu, Chenxi Liu, 1 Aug 2025, REACT: A Real-Time Edge-AI Based V2X Framework for Accident Avoidance in Autonomous Driving System, https://arxiv.org/abs/2508.01057
  • Jesse He, Akbar Rafiey, Gal Mishne, Yusu Wang, 1 Aug 2025, Explaining GNN Explanations with Edge Gradients, https://arxiv.org/abs/2508.01048
  • Heting Liu, Junzhe Huang, Fang He, Guohong Cao, 3 Aug 2025, Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices, https://arxiv.org/abs/2508.01580
  • Xiangwang Hou, Jingjing Wang, Fangming Guan, Jun Du, Chunxiao Jiang, Yong Ren, 3 Aug 2025, Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design, https://arxiv.org/abs/2508.01745
  • Sangjun Park, Tony Q.S. Quek, Hyowoon Seo, 4 Aug 2025, Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients, https://arxiv.org/abs/2508.02235
  • Boran Zhao, Haiduo Huang, Qiwei Dang, Wenzhe Zhao, Tian Xia, Pengju Ren, 4 Aug 2025, NMS: Efficient Edge DNN Training via Near-Memory Sampling on Manifolds, https://arxiv.org/abs/2508.02313
  • Leyao Wang, Xutao Mao, Xuhui Zhan, Yuying Zhao, Bo Ni, Ryan A. Rossi, Nesreen K. Ahmed, Tyler Derr, 2 Aug 2025, Towards Bridging Review Sparsity in Recommendation with Textual Edge Graph Representation, https://arxiv.org/abs/2508.01128
  • Dulana Rupanetti, Naima Kaabouch, 3 Aug 2025, Leveraging Machine Learning for Botnet Attack Detection in Edge-Computing Assisted IoT Networks, https://arxiv.org/abs/2508.01542
  • Paul Zaha, Lars B\"ocking, Simeon Allmendinger, Leopold M\"uller, Niklas K\"uhl, 4 Aug 2025, Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image Segmentation, https://arxiv.org/abs/2508.02281
  • Chen Feng and Yicheng Lin and Shaojie Zhuo and Chenzheng Su and Ramchalam Kinattinkara Ramakrishnan and Zhaocong Yuan and Xiaopeng Zhang, 1 Aug 2025, Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models, https://arxiv.org/abs/2507.07877
  • Osama Mohammed, Jiaxin Pan, Mojtaba Nayyeri, Daniel Hern\'andez and Steffen Staab, 5 Aug 2025, Full-History Graphs with Edge-Type Decoupled Networks for Temporal Reasoning, https://arxiv.org/abs/2508.03251
  • Xingdan Wang, Jiayi He, Zhiqing Tang, Jianxiong Guo, Jiong Lou, Liping Qian, Tian Wang, Weijia Jia, 5 Aug 2025, Adaptive AI Agent Placement and Migration in Edge Intelligence Systems, https://arxiv.org/abs/2508.03345
  • Jialin Zheng and Haoyu Wang and Yangbin Zeng and Di Mou and Xin Zhang and Hong Li and Sergio Vazquez and Leopoldo G. Franquelo, 4 Aug 2025, Physics-Embedded Neural ODEs for Sim2Real Edge Digital Twins of Hybrid Power Electronics Systems, https://arxiv.org/abs/2508.02887
  • Matteo Caligiuri, Francesco Barbato, Donald Shenaj, Umberto Michieli, Pietro Zanuttigh, 5 Aug 2025, FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models, https://arxiv.org/abs/2508.03356
  • Zexu Huang, Min Xu, Stuart Perry, 6 Aug 2025, DET-GS: Depth- and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting, https://arxiv.org/abs/2508.04099
  • Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
  • Yuze Liu, Tiehua Zhang, Zhishu Shen, Libing Wu, Shiping Chen and Jiong Jin, 1 Aug 2025, Towards Heterogeneity-Aware and Energy-Efficient Topology Optimization for Decentralized Federated Learning in Edge Environment, https://arxiv.org/abs/2508.08278
  • Jing Liu, Yao Du, Kun Yang, Jiaqi Wu, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C.M. Leung, 12 Aug 2025, Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey, https://arxiv.org/abs/2505.01821
  • Zijun Jiang and Yangdi Lyu, 13 Aug 2025, MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI, https://arxiv.org/abs/2508.09500
  • Bokeng Zheng, Jianqiang Zhong, Jiayi Liu, Xiaoxi Zhang, 13 Aug 2025, Decentralized Rank Scheduling for Energy-Constrained Multi-Task Federated Fine-Tuning in Edge-Assisted IoV Networks, https://arxiv.org/abs/2508.09532
  • Changyuan Zhao, Guangyuan Liu, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Zan Li, Xuemin (Sherman) Shen, Zhu Han, Sumei Sun, Chau Yuen, Dong In Kim, 13 Aug 2025, Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges, https://arxiv.org/abs/2508.09561
  • Muqing Li, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, CoMoE: Collaborative Optimization of Expert Aggregation and Offloading for MoE-based LLMs at Edge, https://arxiv.org/abs/2508.09208
  • Alessandro Pierro, Steven Abreu, Jonathan Timcheck, Philipp Stratmann, Andreas Wild, Sumit Bam Shrestha, 13 Aug 2025, Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity, https://arxiv.org/abs/2502.01330
  • Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui, 15 Aug 2025, CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems, https://arxiv.org/abs/2508.11287
  • Rui Bao, Nan Xue, Yaping Sun, Zhiyong Chen, 15 Aug 2025, Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks, https://arxiv.org/abs/2508.11291
  • Tiancheng Zhang, Cheng Zhang, Shuren Liu, Xiaofei Wang, Shaoyuan Huang, Wenyu Wang, 18 Aug 2025, HRS: Hybrid Representation Framework with Scheduling Awareness for Time Series Forecasting in Crowdsourced Cloud-Edge Platforms, https://arxiv.org/abs/2508.12839
  • Chen Qian, Xinran Yu, Zewen Huang, Danyang Li, Qiang Ma, Fan Dang, Xuan Ding, Guangyong Shang, Zheng Yang, 18 Aug 2025, SpotVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer, https://arxiv.org/abs/2508.12638
  • Prabath Abeysekara, Hai Dong, 18 Aug 2025, Data-driven Trust Bootstrapping for Mobile Edge Computing-based Industrial IoT Services, https://arxiv.org/abs/2508.12560
  • Bachtiar Herdianto, Romain Billot, Flavien Lucas, Marc Sevaux, and Daniele Vigo, 12 Aug 2025, Edge-Selector Model Applied for Local Search Neighborhood for Solving Vehicle Routing Problems, https://arxiv.org/abs/2508.14071
  • Zengyi Wo, Wenjun Wang, Minglai Shao, Chang Liu, Yumeng Wang, Yueheng Sun, 20 Aug 2025, Addressing Graph Anomaly Detection via Causal Edge Separation and Spectrum, https://arxiv.org/abs/2508.14684
  • Ahmed Mujtaba, Gleb Radchenko, Radu Prodan, Marc Masana, 20 Aug 2025, Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data, https://arxiv.org/abs/2508.14769
  • Chen-Hao Chang, Hui-Ju Hung, Chia-Hsun Lu, Chih-Ya Shen, 20 Aug 2025, Enhancing Contrastive Link Prediction With Edge Balancing Augmentation, https://arxiv.org/abs/2508.14808
  • Zihao Wang, Junming Zhang, 21 Aug 2025, From Bits to Boardrooms: A Cutting-Edge Multi-Agent LLM Framework for Business Excellence, https://arxiv.org/abs/2508.15447
  • Dingzhu Wen, Sijing Xie, Xiaowen Cao, Yuanhao Cui, Jie Xu, Yuanming Shi, and Shuguang Cui, 21 Aug 2025, Integrated Sensing, Communication, and Computation for Over-the-Air Federated Edge Learning, https://arxiv.org/abs/2508.15185
  • Zewei Xin, Qinya Li, Chaoyue Niu, Fan Wu, Guihai Chen, 21 Aug 2025, Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model, https://arxiv.org/abs/2411.13787
  • Benjamin Murphy, Twm Stone, 14 Aug 2025, Uplifted Attackers, Human Defenders: The Cyber Offense-Defense Balance for Trailing-Edge Organizations, https://arxiv.org/abs/2508.15808
  • Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang, 21 Aug 2025, MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications, https://arxiv.org/abs/2504.09014
  • Boran Zhao, Hetian Liu, Zihang Yuan, Li Zhu, Fan Yang, Lina Xie Tian Xia, Wenzhe Zhao, Pengju Ren, 19 Aug 2025, AdapSNE: Adaptive Fireworks-Optimized and Entropy-Guided Dataset Sampling for Edge DNN Training, https://arxiv.org/abs/2508.16647
  • Nishant Gavhane, Arush Mehrotra, Rohit Chawla, Peter Proenca, 23 Aug 2025, MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices, https://arxiv.org/abs/2508.17137
  • Sam Buchanan, Druv Pai, Yi Ma, Valentin De Bortoli, 25 Aug 2025, On the Edge of Memorization in Diffusion Models, https://arxiv.org/abs/2508.17689
  • Dabbrata Das, Mahshar Yahan, Md Tareq Zaman, and Md Rishadul Bayesh, 25 Aug 2025, Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection, https://arxiv.org/abs/2508.17877
  • Xuecheng Bai, Yuxiang Wang, Boyu Hu, Qinyuan Jie, Chuanzhi Xu, Hongru Xiao, Kechen Li, Vera Chung, 24 Jul 2025, DRWKV: Focusing on Object Edges for Low-Light Image Enhancement, https://arxiv.org/abs/2507.18594
  • Haiyuan Li, Hari Madhukumar, Peizheng Li, Yuelin Liu, Yiran Teng, Yulei Wu, Ning Wang, Shuangyi Yan, Dimitra Simeonidou, 18 Jul 2025, Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges, https://arxiv.org/abs/2410.23086
  • Xu Cheng, Liang Yao, Feng He, Yukuo Cen, Yufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, Jie Tang, 19 Jul 2025, LPS-GNN : Deploying Graph Neural Networks on Graphs with 100-Billion Edges, https://arxiv.org/abs/2507.14570
  • Yanjie Dong, Haijun Zhang, Chengming Li, Song Guo, Victor C. M. Leung, Xiping Hu, 6 Aug 2025, Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches, https://arxiv.org/abs/2408.10691

Hybrid Edge-Cloud Architectures

A hybrid architecture is where some processing is done on edge devices (e.g., PCs or security cameras), and some is passed up to the cloud for more powerful processing. The "Apple Intelligence" architecture is a prominent example now, with some processing done "on-device" for iPhone and Macs, and some passed up to the cloud.

Internet of Things (IoT)

IoT is an edge platform involving any low-resource devices on the internet. Research papers on LLMs and IoT devices includes:

  • L. Cheng, Y. Gu, Q. Liu, L. Yang, C. Liu and Y. Wang, 2024, Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A Survey, in IEEE Transactions on Sustainable Computing, doi: 10.1109/TSUSC.2024.3353176. https://ieeexplore.ieee.org/abstract/document/10398463
  • Rei Barjami, Antonio Miele, and Luca Mottola. 2024. Intermittent Inference: Trading a 1% Accuracy Loss for a 1.9x Throughput Speedup. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems (SenSys '24). Association for Computing Machinery, New York, NY, USA, 647–660. https://doi.org/10.1145/3666025.3699364 https://dl.acm.org/doi/abs/10.1145/3666025.3699364 https://dl.acm.org/doi/pdf/10.1145/3666025.3699364
  • Ye Cheng, Minghui Xu, Yue Zhang, Kun Li, Ruoxi Wang, Lian Yang, 16 Nov 2024, AutoIoT: Automated IoT Platform Using Large Language Models, https://arxiv.org/abs/2411.10665
  • Ibrahim Kok, Orhan Demirci, Suat Ozdemir, 20 Nov 2024, When IoT Meet LLMs: Applications and Challenges, https://arxiv.org/abs/2411.17722
  • A. K. Al-Zihairy and A. E. Abdelkareem, "Optimizing YOLOv8-cls: A Step Towards Smarter Edge Environments," 2024 1st International Conference on Emerging Technologies for Dependable Internet of Things (ICETI), Sana'a, Yemen, 2024, pp. 1-6, doi: 10.1109/ICETI63946.2024.10777236. https://ieeexplore.ieee.org/abstract/document/10777236
  • Shubham Vaishnav, Praveen Kumar Donta, Sindri Magn\'usson, 13 Aug 2025, Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints, https://arxiv.org/abs/2505.02640
  • Amod Kant Agrawal, 23 Jul 2025, Our Cars Can Talk: How IoT Brings AI to Vehicles, https://arxiv.org/abs/2507.17214
  • Harsha Sammangi (Dakota State University), Aditya Jagatha (College of Business and Information Systems, Dakota State University), Giridhar Reddy Bojja (College of Business, Michigan Technological University), Jun Liu (College of Business and I.S, Dakota State University), 29 Apr 2025, Decentralized AI-driven IoT Architecture for Privacy-Preserving and Latency-Optimized Healthcare in Pandemic and Critical Care Scenarios, https://arxiv.org/abs/2507.15859
  • Zied Jenhani and Mounir Bensalem and Jasenka Dizdarevi\'c and Admela Jukan, 22 Jul 2025, An Experimental Study of Split-Learning TinyML on Ultra-Low-Power Edge/IoT Nodes, https://arxiv.org/abs/2507.16594
  • Ahmad Alhonainy (1), Praveen Rao (1) ((1) University of Missouri, USA), 19 Jul 2025, Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments, https://arxiv.org/abs/2507.17772
  • Aiman Faiz, Anna Maria De Roberto, Claudio Pascarelli, Gianvito Mitrano, Gianluca Fimiani, Marina Garofano, Genoveffa Tortora, Mariangela Lazoi, Claudio Passino, Alessia Bramanti, 24 Jul 2025, Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification, https://arxiv.org/abs/2505.09619
  • Mohammad Mehdi Rastikerdar, Jin Huang, Hui Guan, Deepak Ganesan, 11 Aug 2025, In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation, https://arxiv.org/abs/2409.07796
  • Arianna Stropeni, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Marco Fabris, Gian Antonio Susto, 28 Jul 2025, Towards Scalable IoT Deployment for Visual Anomaly Detection via Efficient Compression, https://arxiv.org/abs/2505.07119
  • Ze Zhang and Qian Dong and Wenhan Wang, 30 Jul 2025, AdapSCA-PSO: An Adaptive Localization Algorithm with AI-Based Hybrid SCA-PSO for IoT WSNs, https://arxiv.org/abs/2507.22317
  • Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu, 30 Jul 2025, NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT, https://arxiv.org/abs/2404.08939
  • Dulana Rupanetti, Naima Kaabouch, 3 Aug 2025, Leveraging Machine Learning for Botnet Attack Detection in Edge-Computing Assisted IoT Networks, https://arxiv.org/abs/2508.01542
  • Natalia Emelianova, Carlos Kamienski and Ronaldo C. Prati, 7 Aug 2025, Optimizing IoT Threat Detection with Kolmogorov-Arnold Networks (KANs), https://arxiv.org/abs/2508.05591
  • Muhammad Sakib Khan Inan, Kewen Liao, 13 Aug 2025, DeepFeatIoT: Unifying Deep Learned, Randomized, and LLM Features for Enhanced IoT Time Series Sensor Data Classification in Smart Industries, https://arxiv.org/abs/2508.09468
  • Jesus Oma\~na Iglesias, Carlos Segura Perales, Stefan Gei{\ss}ler, Diego Perino, Andra Lutu, 13 Aug 2025, Anomaly Detection for IoT Global Connectivity, https://arxiv.org/abs/2508.09660
  • Afrah Gueriani, Hamza Kheddar, Ahmed Cherif Mazari and Mohamed Chahine Ghanem, 17 Aug 2025, A Robust Cross-Domain IDS using BiGRU-LSTM-Attention for Medical and Industrial IoT Security, https://arxiv.org/abs/2508.12470
  • Prabath Abeysekara, Hai Dong, 18 Aug 2025, Data-driven Trust Bootstrapping for Mobile Edge Computing-based Industrial IoT Services, https://arxiv.org/abs/2508.12560
  • Hui Wei, Dong Yoon Lee, Shubham Rohal, Zhizhang Hu, Ryan Rossi, Shiwei Fang, Shijia Pan, 21 Aug 2025, A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis, https://arxiv.org/abs/2506.12263
  • Sunwoo Kim, 17 Aug 2025, Deep Learning and Matrix Completion-aided IoT Network Localization in the Outlier Scenarios, https://arxiv.org/abs/2508.18225

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: