Aussie AI

Edge Computing

Last Updated 22 October, 2025

by David Spuler, Ph.D.

Edge Computing is the name researchers use for running computations on various low-resource devices. The devices on the "edge" are "close" to the user, but "far away" from the bigger servers in the cloud. The goal is therefore to run machine learning code on these smaller devices. Examples of such edge devices include:

Smartphones (see AI Smartphones)
Desktops and laptops
Cars (e.g. autonomous self-driving cars)
Video cameras (e.g. security cameras)
Internet of Things (IoT) devices (e.g. industrial devices, refrigerators, network stations, etc.)

Running AI models on edge devices usually means inference only, because the small devices usually cannot support the cost of training in terms of processing power and/or storage. However, there is some research into "on-device training."

Many architectures that use edge computing involve multiple machines, with at least two being the edge device and a main server. Hence, much of the research into ensemble methods such as distributed inference is also relevant.

Survey Papers on Edge Computing

Praveen Joshi, Mohammed Hasanuzzaman, Chandra Thapa, Haithem Afli, Ted Scully, "Enabling All In-Edge Deep Learning: A Literature Review", IEEE Access, vol.11, pp.3431-3460, 2023. https://ieeexplore.ieee.org/document/10007810, https://arxiv.org/abs/2204.03326 (Extensive survey of edge computing, including deployment architectures and optimizations.)
Kah Phooi Seng, Li-Minn Ang, "Embedded Intelligence: State-of-the-Art and Research Challenges", IEEE Access, vol.10, pp.59236-59258, 2022. https://ieeexplore.ieee.org/document/9775683, PDF: https://research.usc.edu.au/esploro/outputs/99640278002621
X Wang, J Li, Z Ning, Q Song, L Guo, S Guo, July 2023, Wireless powered mobile edge computing networks: A survey, ACM Computing Surveys, Volume 55, Issue 13s, Article No. 263, pp 1–37, https://dl.acm.org/doi/abs/10.1145/3579992 PDF: http://101.43.59.126/static/53.Wireless_Powered_Mobile_Edge_Vomputing_Networks_A_Survey.pdf
H Hua, Y Li, T Wang, N Dong, W Li, J Cao, 2023, Edge computing with artificial intelligence: A machine learning perspective, ACM Computing Surveys, https://dl.acm.org/doi/abs/10.1145/3555802 PDF: https://dl.acm.org/doi/pdf/10.1145/3555802
HJ Damsgaard, A Ometov, J Nurmi, 2023, ACM Computing Surveys, Approximation Opportunities in Edge Computing Hardware: A Systematic Literature Review https://dl.acm.org/doi/abs/10.1145/3572772, PDF: https://dl.acm.org/doi/pdf/10.1145/3572772
Tian Wang, Yuzhu Liang, Xuewei Shen, Xi Zheng, Adnan Mahmood, Quan Z. Sheng, 2023, Edge Computing and Sensor-Cloud: Overview, Solutions, and Directions, ACM Computing Surveys, Volume 55, Issue 13s, Article No.: 281, pp 1–37, https://dl.acm.org/doi/abs/10.1145/3582270, PDF: http://web.science.mq.edu.au/~qsheng/papers/CSUR-edge.pdf
Y Mao, C You, J Zhang, K Huang, 2017, Mobile edge computing: Survey and research outlook, ACM Computing Surveys, Volume 55, Issue 13s, Article No.: 281, pp 1–37, PDF: https://www.researchgate.net/profile/Changsheng-You/publication/312061424_Mobile_Edge_Computing_Survey_and_Research_Outlook/links/5c22f648a6fdccfc70690a30/Mobile-Edge-Computing-Survey-and-Research-Outlook.pdf
Wazir Zada Khan, Ejaz Ahmed, Saqib Hakak, Ibrar Yaqoob, Arif Ahmed, 2019, Edge computing: A survey Future Generation Computer Systems, Volume 97, August 2019, Pages 219-235, https://www.sciencedirect.com/science/article/abs/pii/S0167739X18319903, PDF: https://www.researchgate.net/profile/Ibrar_Yaqoob/publication/331362529_Edge_computing_A_survey/links/5ca33dcca6fdcc12ee8c3a2a/Edge-computing-A-survey.pdf
Nasir Abbas; Yan Zhang; Amir Taherkordi; Tor Skeie, 2018, Mobile edge computing: A survey, IEEE Internet of Things Journal, Volume 5, Issue 1, February 2018, https://ieeexplore.ieee.org/document/8030322, https://www.duo.uio.no/bitstream/handle/10852/65081/Nasir_Abbas_Thesis.pdf?sequence=1
Yuyi Mao; Changsheng You; Jun Zhang; Kaibin Huang; Khaled B. Letaief, 2017, A survey on mobile edge computing: The communication perspective IEEE Communications Surveys & Tutorials, Volume 19, Issue 4, Fourthq uarter 2017, https://ieeexplore.ieee.org/document/8016573, PDF: https://arxiv.org/pdf/1701.01090
Fang Liu, Guoming Tang, Youhuizi Li, Zhiping Cai, Xingzhou Zhang, Tongqing Zhou, 2019, A survey on edge computing systems and tools, Proceedings of the IEEE, Volume 107, Issue 8, August 2019, https://ieeexplore.ieee.org/abstract/document/8746691/, https://arxiv.org/pdf/1911.02794 (Includes a survey of open source edge computing projects and edge ML frameworks in 2019.)
Xiaofei Wang, Yiwen Han, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen, 2020, Convergence of Edge Computing and Deep Learning: A Comprehensive Survey, IEEE Communications Surveys & Tutorials, Volume: 22, Issue: 2, Secondquarter 2020, https://ieeexplore.ieee.org/abstract/document/8976180/, https://arxiv.org/abs/1907.08349
Keyan Cao; Yefan Liu; Gongjie Meng; Qimeng Sun, 2020, An Overview on Edge Computing Research, IEEE Access, Volume 8, https://ieeexplore.ieee.org/document/9083958, PDF: https://ieeexplore.ieee.org/iel7/6287639/6514899/09083958.pdf (General survey of edge computing, not specific to ML.)
Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, Dimitrios S. Nikolopoulos, 2016, Challenges and opportunities in edge computing, 2016 IEEE International Conference on Smart Cloud (SmartCloud), https://ieeexplore.ieee.org/abstract/document/7796149/, https://arxiv.org/pdf/1609.01967 (General edge computing theory, not specific to ML.)
J Chen, X Ran, 2019, Deep learning with edge computing: A review, Proceedings of the IEEE, Volume 107, Issue 8, August 2019, https://ieeexplore.ieee.org/abstract/document/8763885/, PDF: https://ieeexplore.ieee.org/ielaam/5/8789751/8763885-aam.pdf
A Bourechak, O Zedadra, MN Kouahla, A Guerrieri, 2023, At the Confluence of Artificial Intelligence and Edge Computing in IoT-Based Applications: A Review and New Perspectives, Sensors, https://www.mdpi.com/1424-8220/23/3/1639, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9920982/
Linghe Kong, Jinlin Tan, Junqin Huang, Guihai Chen, Shuaitian Wang, Xi Jin, Peng Zeng, Muhammad Khan, Sajal K. Das, 2022, Edge-computing-driven internet of things: A survey, ACM Computing Surveys, Volume 55, Issue 8, Article No. 174, pp 1–41, https://dl.acm.org/doi/abs/10.1145/3555308, PDF: https://huangjunqin.com/papers/KongCSUR2022Edge.pdf
M Lee, S Lee, T Kim, 2023, Performance Evaluation of Efficient Vision Transformers on Embedded Edge Platforms, IEMEK Journal of Embedded Systems and Applications, https://koreascience.kr/article/JAKO202325643250869.page, PDF https://koreascience.kr/article/JAKO202325643250869.pdf (Abstract in English, paper In Korean.)
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Othmane Friha, Mohamed Amine Ferrag, Burak Kantarci, Burak Cakmak, Arda Ozgun, Nassira Ghoualmi-Zine, 2024, LLM-based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10669603
Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
Kailai Sun, Xinwei Wang, Xi Miao, Qianchuan Zhao, Oct 2024, A review of AI edge devices and lightweight CNN and LLM deployment, Neurocomputing, Volume 614, 2025, 128791, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2024.128791 https://www.sciencedirect.com/science/article/abs/pii/S0925231224015625
Mozhgan Navardi, Romina Aalishah, Yuzhe Fu, Yueqian Lin, Hai Li, Yiran Chen, Tinoosh Mohsenin, 19 Feb 2025, GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices, https://arxiv.org/abs/2502.15816
Shaibal Saha, Lanyu Xu, 26 Feb 2025, Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies, https://arxiv.org/abs/2503.02891

Research on Edge Computing

There are plenty of papers on edge computing to choose from:

Jonas Geiping, Tom Goldstein, Dec 2022, Cramming: Training a Language Model on a Single GPU in One Day, https://arxiv.org/abs/2212.14034 Code: https://github.com/JonasGeiping/cramming (Note: uses Pytorch nvFuser deep learning compiler, which seems to be deprecated now.)
Benj Edwards, March 14, 2023, You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi, Ars Technica, https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/
Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, Jan. 2020. doi:10.1109/TWC.2019.2946140, https://arxiv.org/abs/1910.05316
Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pages 296–308. Springer, 2020, https://arxiv.org/abs/2008.05124
Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, Jan. 2020. doi:10.1109/TWC.2019.2946140, https://arxiv.org/abs/1910.05316
Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pages 296–308. Springer, 2020, https://arxiv.org/abs/2008.05124
Tao Ge, Si-Qing Chen, and Furu Wei. 2022. EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10786– 10798, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics, https://arxiv.org/abs/2202.07959
Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2021. ProFormer: Towards On-Device LSH Projection Based Transformers. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2823– 2828, Online. Association for Computational Linguistics. https://arxiv.org/abs/2004.05801
F Manca, F Ratto, 2023, ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs arXiv preprint arXiv:2309.13321, https://arxiv.org/pdf/2309.13321.pdf (Approximation techniques applied to edge computing.)
Pierre-Emmanuel Novac, March 2023, MicroAI: Embedded Artificial Intelligence for Human Activity Recognition on Smart Glasses, Ph.D. Thesis, Artificial Intelligence. Université Côte d’Azur, https://theses.hal.science/tel-04049008/document (Quantization in smart glasses device.)
R Snytsar, Oct 2023, Accelerating Machine Learning Primitives on Commodity Hardware, arXiv preprint arXiv:2310.05218, https://arxiv.org/pdf/2310.05218.pdf (Uses the "sliding window" technique to optimize general matrix multiplication on edge devices.)
GY Lee, T Dam, MM Ferdaus, DP Poenar, VN Duong, Oct 2023, Unlocking the capabilities of explainable fewshot learning in remote sensing, https://arxiv.org/pdf/2310.08619.pdf
PyTorch Edge Team, October 17, 2023, PyTorch Edge: Enabling On-Device Inference Across Mobile and Edge Devices with ExecuTorch, https://pytorch.org/blog/pytorch-edge/
Junho Wohn, February 2024, Optimizing Deep Learning Model Inference using Efficient Model Partitioning on Edge Devices, Thesis for the Master of Science, Graduate School of Hanyang University, https://repository.hanyang.ac.kr/handle/20.500.11754/188388, PDF: https://hanyang.dcollection.net/public_resource/pdf/200000726139_20240331200233.pdf (Compiles models using the TVM deep learning compiler and then partitions them across multiple edge devices for collaborative edge inference.)
Zao Zhang, 23 May 2024, Design Efficient Deep Neural Networks with System Optimization, Ph.D. Thesis, School of Electrical and Information Engineering, Faculty of Engineering, The University of Sydney, Australia, PDF: https://ses.library.usyd.edu.au/bitstream/handle/2123/32642/zhang_z_thesis.pdf?sequence=1&isAllowed=y https://ses.library.usyd.edu.au/handle/2123/32642 https://hdl.handle.net/2123/32642
Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım, 16 May 2024, Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems, https://arxiv.org/abs/2405.10426
Md Fahim Faysal Khan, May 2024, Constraint Driven Multimodal Edge Intelligence, Ph.D. Thesis, Electrical Engineering and Computer Science, Pennsylvania State University, https://etda.libraries.psu.edu/files/final_submissions/29680 (Layer-specific quantization levels for mixed-precision quantization.)
Jeffrey Yu, Kartik Prabhu, Yonatan Urman, Robert M. Radway, Eric Han, Priyanka Raina, 27 April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, April 2024, Pages 5–21, https://doi.org/10.1145/3620666.3651368 https://dl.acm.org/doi/abs/10.1145/3620666.3651368
Jiwei HUANG, Fangzheng LIU, and Jianbin ZHANG, “Multi-dimensional QoS Evaluation and Optimization of Mobile Edge Computing for IoT: A Survey,” Chinese Journal of Electronics, vol. 33, no. 5, pp. 1–16, 2024 doi: 10.23919/cje.2023.00.264 shu https://cje.ejournal.org.cn/article/doi/10.23919/cje.2023.00.264 (Theory of benchmarking and evaluation of mobile edge computing.)
Mikail Yayla, 2024, A vision for edge AI: ROBUST BINARIZED NEURAL NETWORKS ON EMERGING RESOURCE-CONSTRAINED HARDWARE Ph.D. Dissertation, Technischen Universität Dortmund, Fakultät Informatik, Dortmund 2024, http://129.217.131.68:8080/bitstream/2003/42431/1/Dissertation_Yayla.pdf (Binarized networks with consideration of both software and hardware issues.)
Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni, 16 Apr 2024, Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration, https://arxiv.org/abs/2404.10733
Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
Seungtae Hong, Gunju Park, Jeong-Si Kim, 9 June 2024, Automated deep-learning model optimization framework for microcontrollers, https://doi.org/10.4218/etrij.2023-0522 https://onlinelibrary.wiley.com/doi/full/10.4218/etrij.2023-0522 (Framework for using quantization and pruning on microcontroller devices.)
Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen, 27 May 2024, Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference, https://arxiv.org/abs/2405.17245
Qualcomm, May 2023, The future of AI is hybrid, Qualcomm White Paper, https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/Whitepaper-The-future-of-AI-is-hybrid-Part-1-Unlocking-the-generative-AI-future-with-on-device-and-hybrid-AI.pdf
Guozhi Yan; Kai Liu; Chunhui Liu; Jie Zhang, 2024, Edge Intelligence for Internet of Vehicles: A Survey, IEEE Transactions on Consumer Electronics (Early Access), 18 March 2024, https://ieeexplore.ieee.org/abstract/document/10474509
Daniel Situnayake, 24 January 2023, AI at the Edge: Solving Real-World Problems with Embedded Machine Learning, O'Reilly Media, Inc, USA, https://www.amazon.com/dp/1098120205/
Jaskirat Singh, Bram Adams, Ahmed E. Hassan, 25 Mar 2024, On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance, https://arxiv.org/abs/2403.17154 (MLOps deployment for quantization, partitioning and early-exit across mobile, edge, and cloud platforms, including running early exit on mobile.)
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
P Dong, L Lu, C Wu, C Lyu, G Yuan, H Tang, Y Wang, 2023, PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile, https://openreview.net/pdf?id=N56hAiQvot Code: https://github.com/PeiyanFlying/PackQViT
Bingkun Lai, Jinbo Wen, Jiawen Kang, Hongyang Du, Jiangtian Nie, Changyan Yi, Dong In Kim, Shengli Xie, 19 Dec 2023, Resource-efficient Generative Mobile Edge Networks in 6G Era: Fundamentals, Framework and Case Study, https://arxiv.org/abs/2312.12063
Mohammed Ayyat; Tamer Nadeem; Bartosz Krawczyk, Dec 2023, ClassyNet: Class-Aware Early Exit Neural Networks for Edge Devices, IEEE Internet of Things Journal (Early Access), https://ieeexplore.ieee.org/abstract/document/10365527
Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen, Dec 2023, PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU https://arxiv.org/abs/2312.12456 Code: https://github.com/SJTU-IPADS/PowerInfer
Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar, Dec 2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory Apple Research, https://arxiv.org/abs/2312.11514
X Li, S Chen, S Zhang, L Hou, Y Zhu, Z Xiao, 2023, Human Activity Recognition Using IR-UWB Radar: A Lightweight Transformer Approach, IEEE Geoscience and Remote Sensing Letters (Early Access), https://ieeexplore.ieee.org/document/10247554
Ali Rahmanian, Doctoral Thesis, April 2024, Edge Orchestration for Latency-Sensitive Applications, Department of Computing Science, Umea University, Sweden, https://www.diva-portal.org/smash/get/diva2:1849510/FULLTEXT02.pdf
Victor J.B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini, 3 Apr 2024, Optimizing the Deployment of Tiny Transformers on Low-Power MCUs, https://arxiv.org/abs/2404.02945 (Uses an approach called "Fused Weight Self-Attention" that fuses some of the QKV matrices and also tiling in multi-head attention, along with 8-bit integer quantization and integerized Softmax.)
MMH Shuvo, SK Islam, J Cheng, Efficient acceleration of deep learning inference on resource-constrained edge devices: A review, 2022, Proceedings of the IEEE ( Volume: 111, Issue: 1, January 2023), pp 42 - 91, 14 December 2022 , https://ieeexplore.ieee.org/abstract/document/9985008 PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008
Minghao Yan, Hongyi Wang, Shivaram Venkataraman, 9 Jan 2024 (v2), PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices, https://arxiv.org/abs/2310.19991 (Faster inference with a focus on pipelining and scheduling of hardware acceleration.)
26 Feb 2024 (v2), From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges, Sai Krishna Revanth Vuruma, Ashley Margetts, Jianhai Su, Faez Ahmed, Biplav Srivastava, https://arxiv.org/abs/2402.12702
Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
Rohit Sharma, 9 July 2022 Introduction to TinyML, Independently published, https://www.amazon.com/Introduction-TinyML-Rohit-Sharma/dp/B0B5Q281L9/
Semaphore, Dec 14, 2023, 6 Ways to Run LLMs Locally, https://semaphoreci.medium.com/6-ways-to-run-llms-locally-fa25be0797e5 (The six ways are HF Transformers, LangChain, Llama.cpp, Llamafile, Ollama, and GPT4All.)
Zhepeng Wang, Isaacshubhanand Putla, Weiwen Jiang, Youzuo Lin, Oct 2023, Edge-InversionNet: Enabling Efficient Inference of InversionNet on Edge Devices, https://arxiv.org/abs/2310.09667 (Using structured pruning via layerwise filter pruning to run a model on a Raspberry Pi.)
Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao, Nov 2023, TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices, https://arxiv.org/abs/2311.01759
Yuyi Mao, Xianghao Yu, Kaibin Huang, Ying-Jun Angela Zhang, Jun Zhang, Dec 2023, Green Edge AI: A Contemporary Survey, https://arxiv.org/abs/2312.00333
Murray Kornelsen, April 2023, Low-Latency BERT Inference for Heterogeneous Multi-Processor Edge Devices, Department of Electrical & Computer Engineering, McGill University, Canada https://escholarship.mcgill.ca/downloads/m326m732p
Yifeng Wu; Xu He; Lingfei Mo; Qing Wang, Jan 2024, A Self-Attention-Assisted TinyML With Effective Representation for UWB NLOS Identification, IEEE Internet of Things Journal (Early Access), https://ieeexplore.ieee.org/abstract/document/10380220
Ning Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang, 5 Jan 2024, Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence, https://arxiv.org/abs/2401.02668 (Covers processing on cloud and edge servers in various configurations with communication between nodes for both training/fine-tuning and inference tasks.)
C Gernigon, SI Filip, O Sentieys, C Coggiola, M Bruno, Oct 2023, Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing, https://hal.science/hal-04252197/document
Y Liang, Z Wang, X Xu, Y Tang, Z Jie, J Lu, Oct 2023, MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory, arXiv preprint arXiv:2310.16898, https://arxiv.org/pdf/2310.16898.pdf
MWU Rahman, MM Abrar, HG Copening, S Hariri, Oct 2023, Quantized Transformer Language Model Implementations on Edge Devices, https://arxiv.org/pdf/2310.03971.pdf (Uses a "FlatBuffer" format on TensorFlow-Lite.)
H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
PE Novac, G Boukli Hacene, A Pegatoquet, 2021, Quantization and deployment of deep neural networks on microcontrollers, Sensors, 2021, https://www.mdpi.com/1424-8220/21/9/2984
P Cruz, N Achir, AC Viana, 2022, On the edge of the deployment: A survey on multi-access edge computing https://dl.acm.org/doi/abs/10.1145/3529758 https://inria.hal.science/hal-03637105/file/ACM_MEC_Survey___Camera_Ready.pdf
W Yu, F Liang, X He, WG Hatcher, C Lu, J Lin, 2017, A survey on the edge computing for the Internet of Things, IEEE Access (Volume: 6), https://ieeexplore.ieee.org/abstract/document/8123913/ https://ieeexplore.ieee.org/iel7/6287639/8274985/08123913.pdf
R. Sanchez-Iborra and A. F. Skarmeta, Tinyml-enabled frugal smart objects: Challenges and opportunities, IEEE Circuits and Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020. https://ieeexplore.ieee.org/document/9166461 PDF: https://sci-hub.se/10.1109/MCAS.2020.3005467
R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/
S. Prakash, T. Callahan, J. Bushagour, C. Banbury, A. V. Green, P. Warden, T. Ansell, and V. J. Reddi, 2023, CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs, 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). pp. 157–167. https://ui.adsabs.harvard.edu/abs/2022arXiv220101863P/abstract
M. Giordano, L. Piccinelli, and M. Magno, Survey and comparison of milliwatts micro controllers for tiny machine learning at the edge, in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 94–97. https://ieeexplore.ieee.org/document/9870017
Md. Maruf Hossain Shuvo; Syed Kamrul Islam; Jianlin Cheng; Bashir I. Morshed, 2023, Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review, Proceedings of the IEEE (Volume 111, Issue 1, January 2023), https://ieeexplore.ieee.org/document/9985008 PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008 (Extensive 2023 survey of inference optimization in general and specifically on edge platforms.)
T Tambe, 2023, Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning, https://dash.harvard.edu/bitstream/handle/1/37375806/Final_Draft_PhD_Dissertation_Thierry_Tambe.pdf?sequence=1&isAllowed=y
Douglas C. Youvan , June 15, 2024, Developing and Deploying AI Applications on NVIDIA Jetson Orin NX: A Comprehensive Guide, https://www.researchgate.net/profile/Douglas-Youvan/publication/381434888_Developing_and_Deploying_AI_Applications_on_NVIDIA_Jetson_Orin_NX_A_Comprehensive_Guide/links/666d7390de777205a32fceb6/Developing-and-Deploying-AI-Applications-on-NVIDIA-Jetson-Orin-NX-A-Comprehensive-Guide.pdf
Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
Dan Peng, Zhihui Fu, Jun Wang, 1 Jul 2024, PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs, https://arxiv.org/abs/2407.01031 (Running fine-tuning on a smartphone via a low-memory optimization using a "derivative-free" "zeroth-order" technique called MeZo, with advantages such as privacy.)
Ying He, Jingcheng Fang, F. Richard Yu, Victor C. Leung, 2024, Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach, PrePrints pp. 1-12, DOI: 10.1109/TMC.2024.3415661, https://www.computer.org/csdl/journal/tm/5555/01/10591707/1YraFlDdKYo
Adarsh Prasad Behera, Paulius Daubaris, Iñaki Bravo, José Gallego, Roberto Morabito, Joerg Widmer, Jaya Prakash Varma Champati, 10 Jul 2024, Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical, https://arxiv.org/abs/2407.11061
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun, 3 Aug 2024, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 Code: https://github.com/OpenBMB/MiniCPM-V
Beom Jin Kang, Hae In Lee, Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim, October 2024, A survey of FPGA and ASIC designs for transformer inference acceleration and optimization, Journal of Systems Architecture, Volume 155, 103247, https://www.sciencedirect.com/science/article/abs/pii/S138376212400184X
R. Narmeen, P. Mach, Z. Becvar and I. Ahmad, 16 August 2024, Joint Exit Selection and Offloading Decision for Applications Based on Deep Neural Networks, IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3444898, https://doi.org/10.1109/JIOT.2024.3444898 https://ieeexplore.ieee.org/abstract/document/10638073
Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
L. Cheng, Y. Gu, Q. Liu, L. Yang, C. Liu and Y. Wang, 2024, Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A Survey, in IEEE Transactions on Sustainable Computing, doi: 10.1109/TSUSC.2024.3353176. https://ieeexplore.ieee.org/abstract/document/10398463
Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 1 Sep 2024, TinyAgent: Function Calling at the Edge, https://arxiv.org/abs/2409.00608 https://github.com/SqueezeAILab/TinyAgent
Tyler Mullen, August 22, 2024, Unlocking 7B+ language models in your browser: A deep dive with Google AI Edge's MediaPipe, https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
Othmane Friha, Mohamed Amine Ferrag, Burak Kantarci, Burak Cakmak, Arda Ozgun, Nassira Ghoualmi-Zine, 2024, LLM-based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10669603
Dimitrios Kafetzis, Iordanis Koutsopoulos, Oct 2024, Demo: AnExperimental Platform for AI Model Partitioning on Resource-constrained Devices, https://dl.acm.org/doi/pdf/10.1145/3641512.3690629
M. Sponner, L. Servadei, B. Waschneck, R. Wille and A. Kumar, "Harnessing Temporal Information for Efficient Edge AI," 2024 9th International Conference on Fog and Mobile Edge Computing (FMEC), Malmö, Sweden, 2024, pp. 5-13, doi: 10.1109/FMEC62297.2024.10710223. https://ieeexplore.ieee.org/abstract/document/10710223
Mistral AI, Oct 2024, Un Ministral, des Ministraux: Introducing the world’s best edge models. https://mistral.ai/news/ministraux/
Michael Nuñez, October 16, 2024, Mistral AI’s new language models bring AI power to your phone and laptop, https://venturebeat.com/business/mistral-ai-new-language-models-bring-ai-power-to-your-phone-and-laptop/
Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
Zebin Yang, Renze Chen, Taiqiang Wu, Ngai Wong, Yun Liang, Runsheng Wang, Ru Huang, Meng Li, 23 Oct 2024, MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers https://arxiv.org/abs/2410.17957
Arun Nanda, Sep 7, 2024, Reducing the Size of AI Models. Running large AI models on edge devices, https://towardsdatascience.com/reducing-the-size-of-ai-models-4ab4cfe5887a
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
Justine, Apr 2023, Edge AI Just Got Faster, https://justine.lol/mmap/ (Loading models using mmap.)
Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Wenjun Zhang, Ping Zhang, 11 Nov 2024, WDMoE: Wireless Distributed Mixture of Experts for Large Language Models, https://arxiv.org/abs/2411.06681
Ibrahim Kok, Orhan Demirci, Suat Ozdemir, 20 Nov 2024, When IoT Meet LLMs: Applications and Challenges, https://arxiv.org/abs/2411.17722
M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris, 5 Dec 2024, MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference, https://arxiv.org/abs/2412.04147
A. K. Al-Zihairy and A. E. Abdelkareem, "Optimizing YOLOv8-cls: A Step Towards Smarter Edge Environments," 2024 1st International Conference on Emerging Technologies for Dependable Internet of Things (ICETI), Sana'a, Yemen, 2024, pp. 1-6, doi: 10.1109/ICETI63946.2024.10777236. https://ieeexplore.ieee.org/abstract/document/10777236
Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
Liam Seymour, Basar Kutukcu, Sabur Baidya, 19 Dec 2024, Large Language Models on Small Resource-Constrained Systems: Performance Characterization, Analysis and Trade-offs, https://arxiv.org/abs/2412.15352 https://github.com/LiamS57/orin-llm-testing
D. Xu et al., "EdgeLLM: Fast On-device LLM Inference with Speculative Decoding" in IEEE Transactions on Mobile Computing, vol. , no. 01, pp. 1-18, PrePrints 5555, doi: 10.1109/TMC.2024.3513457. https://www.computer.org/csdl/journal/tm/5555/01/10812936/22UpTlf6X2U
S. Pareek, A. Saleh Al-Samalek, A. Alkhayyat, S. Singh, A. Singh and S. Dasi, "Efficient Vision Transformers for Edge Devices: Pruning and Quantization Approaches," 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 2024, pp. 1465-1471, doi: 10.1109/ICTACS62700.2024.10840584. https://ieeexplore.ieee.org/abstract/document/10840584
Jindong Li, Tenglong Li, Guobin Shen, Dongcheng Zhao, Qian Zhang, Yi Zeng, 15 Feb 2025, Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA, https://arxiv.org/abs/2502.10659
Xian Peng, Xin Wu, Lianming Xu, Li Wang, Aiguo Fei, 6 Feb 2025, DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices, https://arxiv.org/abs/2502.15735
Shaibal Saha, Lanyu Xu, 26 Feb 2025, Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies, https://arxiv.org/abs/2503.02891
Kangbo Bai, Le Ye, Ru Huang, Tianyu Jia, 16 May 2025, EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge, https://arxiv.org/abs/2505.10782
Jiyong Kim, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, and Jaehyun Park, 14 Aug 2025, eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing, https://arxiv.org/abs/2508.10370
Wei Fan, JinYi Yoon, Xiaochang Li, Huajie Shao, and Bo Ji, 23 Jul 2025, P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices, https://arxiv.org/abs/2507.17228
Radowanul Haque, Aftab Ali, Sally McClean and Naveed Khan, 22 Jul 2025, Explainable Vulnerability Detection in C/C++ Using Edge-Aware Graph Attention Networks, https://arxiv.org/abs/2507.16540
Seunghyeon Kim, Kyeongryeol Go, 22 Jul 2025, Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective, https://arxiv.org/abs/2507.16254
Zied Jenhani and Mounir Bensalem and Jasenka Dizdarevi\'c and Admela Jukan, 22 Jul 2025, An Experimental Study of Split-Learning TinyML on Ultra-Low-Power Edge/IoT Nodes, https://arxiv.org/abs/2507.16594
Arseniy Andreyev and Pierfrancesco Beneventano, 22 Jul 2025, Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD, https://arxiv.org/abs/2412.20553
Linshen Liu, Boyan Su, Junyue Jiang, Guanlin Wu, Cong Guo, Ceyu Xu, Hao Frank Yang, 22 Jul 2025, Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge, https://arxiv.org/abs/2507.04123
Yujia Tong, Jingling Yuan, Chuang Hu, 17 Jul 2025, Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction, https://arxiv.org/abs/2507.17768
Casper Br\"ocheler, Thomas Vroom, Derrick Timmermans, Alan van den Akker, Guangzhi Tang, Charalampos S. Kouzinopoulos, Rico M\"ockel, 18 Jul 2025, A segmented robot grasping perception neural network for edge AI, https://arxiv.org/abs/2507.13970
Shuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya, 18 Jul 2025, Edge Intelligence with Spiking Neural Networks, https://arxiv.org/abs/2507.14069
Sebastian A. Cruz Romero, Misael J. Mercado Hernandez, Samir Y. Ali Rivera, Jorge A. Santiago Fernandez, Wilfredo E. Lugo Beauchamp, 20 Jul 2025, Design of an Edge-based Portable EHR System for Anemia Screening in Remote Health Applications, https://arxiv.org/abs/2507.15146
Eugene Armah, Linda Amoako Bannning, 19 Jul 2025, Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning, https://arxiv.org/abs/2507.14597
Thai T. Vu and John Le, 20 Jul 2025, Quantum Machine Learning for Secure Cooperative Multi-Layer Edge AI with Proportional Fairness, https://arxiv.org/abs/2507.15145
Alon Beck, Noam Levi, Yohai Bar-Sinai, 19 Jul 2025, Grokking at the Edge of Linear Separability, https://arxiv.org/abs/2410.04489
Ananda Prakash Verma, 10 Aug 2025, EDGE: A Theoretical Framework for Misconception-Aware Adaptive Learning, https://arxiv.org/abs/2508.07224
Tuo Zhang, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative, https://arxiv.org/abs/2508.07329
Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen, 25 Jul 2025, DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference, https://arxiv.org/abs/2507.19608
Chengzhuo Han, 28 Jul 2025, Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems, https://arxiv.org/abs/2507.20444
Tianhao Wang, Simon Klancher, Kunal Mukherjee, Josh Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee, 28 Jul 2025, PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes, https://arxiv.org/abs/2507.20967
Yang Zhao, Shusheng Li, Xueshang Feng, 28 Jul 2025, Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit, https://arxiv.org/abs/2507.20623
Xingjian Zhang, Siwei Wen, Wenjun Wu, Lei Huang, 29 Jul 2025, EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity, https://arxiv.org/abs/2507.21848
Abir Ray, 28 Jul 2025, EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks, https://arxiv.org/abs/2507.21196
Ghazal Sobhani, Md. Monzurul Amin Ifath, Tushar Sharma, Israat Haque, 30 Jul 2025, On the Sustainability of AI Inferences in the Edge, https://arxiv.org/abs/2507.23093
Georg Slamanig, Francesco Corti, Olga Saukh, 31 Jul 2025, From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices, https://arxiv.org/abs/2507.23536
Jin Yang, Qiong Wu, Zhiying Feng, Zhi Zhou, Deke Guo and Xu Chen, 1 Aug 2025, Quality-of-Service Aware LLM Routing for Edge Computing with Multiple Experts, https://arxiv.org/abs/2508.00234
Jiyu Chen, Poh Seng Lim, Shuang Peng, Daxiong Luo, JungHau Foo, Yap Deep, Timothy Lee Jun Jie, Kelvin Teh Kae Wen, Fan Yang, Danyu Feng, Hao-Yun Chen, Peng-Wen Chen, Fangyuan Li, Xiaoxin Chen, Wong Wai Mun, 1 Aug 2025, EdgeInfinite-Instruct: Bridging SFT-Based Optimization and NPU-Level Efficiency for Edge Devices, https://arxiv.org/abs/2508.00370
Hangyu Li and Hongyue Wu and Guodong Fan and Zhen Zhang and Shizhan Chen and Zhiyong Feng, 1 Aug 2025, Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices, https://arxiv.org/abs/2506.20644
Fengze Yang, Bo Yu, Yang Zhou, Xuewen Luo, Zhengzhong Tu, Chenxi Liu, 1 Aug 2025, REACT: A Real-Time Edge-AI Based V2X Framework for Accident Avoidance in Autonomous Driving System, https://arxiv.org/abs/2508.01057
Jesse He, Akbar Rafiey, Gal Mishne, Yusu Wang, 1 Aug 2025, Explaining GNN Explanations with Edge Gradients, https://arxiv.org/abs/2508.01048
Heting Liu, Junzhe Huang, Fang He, Guohong Cao, 3 Aug 2025, Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices, https://arxiv.org/abs/2508.01580
Xiangwang Hou, Jingjing Wang, Fangming Guan, Jun Du, Chunxiao Jiang, Yong Ren, 3 Aug 2025, Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design, https://arxiv.org/abs/2508.01745
Sangjun Park, Tony Q.S. Quek, Hyowoon Seo, 4 Aug 2025, Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients, https://arxiv.org/abs/2508.02235
Boran Zhao, Haiduo Huang, Qiwei Dang, Wenzhe Zhao, Tian Xia, Pengju Ren, 4 Aug 2025, NMS: Efficient Edge DNN Training via Near-Memory Sampling on Manifolds, https://arxiv.org/abs/2508.02313
Leyao Wang, Xutao Mao, Xuhui Zhan, Yuying Zhao, Bo Ni, Ryan A. Rossi, Nesreen K. Ahmed, Tyler Derr, 2 Aug 2025, Towards Bridging Review Sparsity in Recommendation with Textual Edge Graph Representation, https://arxiv.org/abs/2508.01128
Dulana Rupanetti, Naima Kaabouch, 3 Aug 2025, Leveraging Machine Learning for Botnet Attack Detection in Edge-Computing Assisted IoT Networks, https://arxiv.org/abs/2508.01542
Paul Zaha, Lars B\"ocking, Simeon Allmendinger, Leopold M\"uller, Niklas K\"uhl, 4 Aug 2025, Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image Segmentation, https://arxiv.org/abs/2508.02281
Chen Feng and Yicheng Lin and Shaojie Zhuo and Chenzheng Su and Ramchalam Kinattinkara Ramakrishnan and Zhaocong Yuan and Xiaopeng Zhang, 1 Aug 2025, Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models, https://arxiv.org/abs/2507.07877
Osama Mohammed, Jiaxin Pan, Mojtaba Nayyeri, Daniel Hern\'andez and Steffen Staab, 5 Aug 2025, Full-History Graphs with Edge-Type Decoupled Networks for Temporal Reasoning, https://arxiv.org/abs/2508.03251
Xingdan Wang, Jiayi He, Zhiqing Tang, Jianxiong Guo, Jiong Lou, Liping Qian, Tian Wang, Weijia Jia, 5 Aug 2025, Adaptive AI Agent Placement and Migration in Edge Intelligence Systems, https://arxiv.org/abs/2508.03345
Jialin Zheng and Haoyu Wang and Yangbin Zeng and Di Mou and Xin Zhang and Hong Li and Sergio Vazquez and Leopoldo G. Franquelo, 4 Aug 2025, Physics-Embedded Neural ODEs for Sim2Real Edge Digital Twins of Hybrid Power Electronics Systems, https://arxiv.org/abs/2508.02887
Matteo Caligiuri, Francesco Barbato, Donald Shenaj, Umberto Michieli, Pietro Zanuttigh, 5 Aug 2025, FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models, https://arxiv.org/abs/2508.03356
Zexu Huang, Min Xu, Stuart Perry, 6 Aug 2025, DET-GS: Depth- and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting, https://arxiv.org/abs/2508.04099
Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
Yuze Liu, Tiehua Zhang, Zhishu Shen, Libing Wu, Shiping Chen and Jiong Jin, 1 Aug 2025, Towards Heterogeneity-Aware and Energy-Efficient Topology Optimization for Decentralized Federated Learning in Edge Environment, https://arxiv.org/abs/2508.08278
Jing Liu, Yao Du, Kun Yang, Jiaqi Wu, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C.M. Leung, 12 Aug 2025, Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey, https://arxiv.org/abs/2505.01821
Zijun Jiang and Yangdi Lyu, 13 Aug 2025, MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI, https://arxiv.org/abs/2508.09500
Bokeng Zheng, Jianqiang Zhong, Jiayi Liu, Xiaoxi Zhang, 13 Aug 2025, Decentralized Rank Scheduling for Energy-Constrained Multi-Task Federated Fine-Tuning in Edge-Assisted IoV Networks, https://arxiv.org/abs/2508.09532
Changyuan Zhao, Guangyuan Liu, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Zan Li, Xuemin (Sherman) Shen, Zhu Han, Sumei Sun, Chau Yuen, Dong In Kim, 13 Aug 2025, Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges, https://arxiv.org/abs/2508.09561
Muqing Li, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, CoMoE: Collaborative Optimization of Expert Aggregation and Offloading for MoE-based LLMs at Edge, https://arxiv.org/abs/2508.09208
Alessandro Pierro, Steven Abreu, Jonathan Timcheck, Philipp Stratmann, Andreas Wild, Sumit Bam Shrestha, 13 Aug 2025, Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity, https://arxiv.org/abs/2502.01330
Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui, 15 Aug 2025, CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems, https://arxiv.org/abs/2508.11287
Rui Bao, Nan Xue, Yaping Sun, Zhiyong Chen, 15 Aug 2025, Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks, https://arxiv.org/abs/2508.11291
Tiancheng Zhang, Cheng Zhang, Shuren Liu, Xiaofei Wang, Shaoyuan Huang, Wenyu Wang, 18 Aug 2025, HRS: Hybrid Representation Framework with Scheduling Awareness for Time Series Forecasting in Crowdsourced Cloud-Edge Platforms, https://arxiv.org/abs/2508.12839
Chen Qian, Xinran Yu, Zewen Huang, Danyang Li, Qiang Ma, Fan Dang, Xuan Ding, Guangyong Shang, Zheng Yang, 18 Aug 2025, SpotVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer, https://arxiv.org/abs/2508.12638
Prabath Abeysekara, Hai Dong, 18 Aug 2025, Data-driven Trust Bootstrapping for Mobile Edge Computing-based Industrial IoT Services, https://arxiv.org/abs/2508.12560
Bachtiar Herdianto, Romain Billot, Flavien Lucas, Marc Sevaux, and Daniele Vigo, 12 Aug 2025, Edge-Selector Model Applied for Local Search Neighborhood for Solving Vehicle Routing Problems, https://arxiv.org/abs/2508.14071
Zengyi Wo, Wenjun Wang, Minglai Shao, Chang Liu, Yumeng Wang, Yueheng Sun, 20 Aug 2025, Addressing Graph Anomaly Detection via Causal Edge Separation and Spectrum, https://arxiv.org/abs/2508.14684
Ahmed Mujtaba, Gleb Radchenko, Radu Prodan, Marc Masana, 20 Aug 2025, Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data, https://arxiv.org/abs/2508.14769
Chen-Hao Chang, Hui-Ju Hung, Chia-Hsun Lu, Chih-Ya Shen, 20 Aug 2025, Enhancing Contrastive Link Prediction With Edge Balancing Augmentation, https://arxiv.org/abs/2508.14808
Zihao Wang, Junming Zhang, 21 Aug 2025, From Bits to Boardrooms: A Cutting-Edge Multi-Agent LLM Framework for Business Excellence, https://arxiv.org/abs/2508.15447
Dingzhu Wen, Sijing Xie, Xiaowen Cao, Yuanhao Cui, Jie Xu, Yuanming Shi, and Shuguang Cui, 21 Aug 2025, Integrated Sensing, Communication, and Computation for Over-the-Air Federated Edge Learning, https://arxiv.org/abs/2508.15185
Zewei Xin, Qinya Li, Chaoyue Niu, Fan Wu, Guihai Chen, 21 Aug 2025, Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model, https://arxiv.org/abs/2411.13787
Benjamin Murphy, Twm Stone, 14 Aug 2025, Uplifted Attackers, Human Defenders: The Cyber Offense-Defense Balance for Trailing-Edge Organizations, https://arxiv.org/abs/2508.15808
Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang, 21 Aug 2025, MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications, https://arxiv.org/abs/2504.09014
Boran Zhao, Hetian Liu, Zihang Yuan, Li Zhu, Fan Yang, Lina Xie Tian Xia, Wenzhe Zhao, Pengju Ren, 19 Aug 2025, AdapSNE: Adaptive Fireworks-Optimized and Entropy-Guided Dataset Sampling for Edge DNN Training, https://arxiv.org/abs/2508.16647
Nishant Gavhane, Arush Mehrotra, Rohit Chawla, Peter Proenca, 23 Aug 2025, MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices, https://arxiv.org/abs/2508.17137
Sam Buchanan, Druv Pai, Yi Ma, Valentin De Bortoli, 25 Aug 2025, On the Edge of Memorization in Diffusion Models, https://arxiv.org/abs/2508.17689
Dabbrata Das, Mahshar Yahan, Md Tareq Zaman, and Md Rishadul Bayesh, 25 Aug 2025, Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection, https://arxiv.org/abs/2508.17877
Xuecheng Bai, Yuxiang Wang, Boyu Hu, Qinyuan Jie, Chuanzhi Xu, Hongru Xiao, Kechen Li, Vera Chung, 24 Jul 2025, DRWKV: Focusing on Object Edges for Low-Light Image Enhancement, https://arxiv.org/abs/2507.18594
Haiyuan Li, Hari Madhukumar, Peizheng Li, Yuelin Liu, Yiran Teng, Yulei Wu, Ning Wang, Shuangyi Yan, Dimitra Simeonidou, 18 Jul 2025, Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges, https://arxiv.org/abs/2410.23086
Xu Cheng, Liang Yao, Feng He, Yukuo Cen, Yufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, Jie Tang, 19 Jul 2025, LPS-GNN : Deploying Graph Neural Networks on Graphs with 100-Billion Edges, https://arxiv.org/abs/2507.14570
Yanjie Dong, Haijun Zhang, Chengming Li, Song Guo, Victor C. M. Leung, Xiping Hu, 6 Aug 2025, Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches, https://arxiv.org/abs/2408.10691
Payam Abdisarabshali, Fardis Nadimi, Kasra Borazjani, Naji Khosravan, Minghui Liwang, Wei Ni, Dusit Niyato, Michael Langberg, Seyyedali Hosseinalipour, 3 Sep 2025, Hierarchical Federated Foundation Models over Wireless Networks for Multi-Modal Multi-Task Intelligence: Integration of Edge Learning with D2D/P2P-Enabled Fog Learning Architectures, https://arxiv.org/abs/2509.03695
Aryan Gupta, Anupam Purwar, 3 Sep 2025, E-ARMOR: Edge case Assessment and Review of Multilingual Optical Character Recognition, https://arxiv.org/abs/2509.03615
Pavle Vasiljevic, Milica Matic, Miroslav Popovic, 4 Sep 2025, Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems, https://arxiv.org/abs/2506.05138
Sri Krishna Vadlamani, Kfir Sulimany, Zhihui Gao, Tingjun Chen, Dirk Englund, 4 Sep 2025, Machine Intelligence on Wireless Edge Networks, https://arxiv.org/abs/2506.12210
Zheyan Qu, Wenbo Wang, Zitong Yu, Boquan Sun, Yang Li, and Xing Zhang, 5 Sep 2025, LLM Enabled Multi-Agent System for 6G Networks: Framework and Method of Dual-Loop Edge-Terminal Collaboration, https://arxiv.org/abs/2509.04993
Guoying Zhu, Meng Li, Haipeng Dai, Xuechen Liu, Weijun Wang, Keran Li, Jun xiao, Ligeng Chen, Wei Wang, 26 Aug 2025, Enabling MoE on the Edge via Importance-Driven Expert Scheduling, https://arxiv.org/abs/2508.18983
Gang Hu, Yinglei Teng, Pengfei Wu, and Nan Wang, 26 Aug 2025, FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge, https://arxiv.org/abs/2508.18663
Jiaqi Wu, Jing Liu, Yang Liu, Lixu Wang, Zehua Wang, Wei Chen, Zijian Tian, Richard Yu, Victor C.M. Leung, 26 Aug 2025, A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks, https://arxiv.org/abs/2508.18803
Siyuan You, Guozheng Xu, Pengwei Zhou, Qiwen Jin, Jian Yao, Li Li, 26 Aug 2025, RoofSeg: An edge-aware transformer-based network for end-to-end roof plane segmentation, https://arxiv.org/abs/2508.19003
Maha Shatta, Konstantinos Balaskas, Paula Carolina Lozano Duarte, Georgios Panagopoulos, Mehdi B. Tahoori, Georgios Zervakis, 27 Aug 2025, Invited Paper: Feature-to-Classifier Co-Design for Mixed-Signal Smart Flexible Wearables for Healthcare at the Extreme Edge, https://arxiv.org/abs/2508.19637
Kan Chen, Zhen Meng, Xiangmin Xu, Jiaming Yang, Emma Li and Philip G. Zhao, 28 Aug 2025, Task-Oriented Edge-Assisted Cross-System Design for Real-Time Human-Robot Interaction in Industrial Metaverse, https://arxiv.org/abs/2508.20664
Guanyu Xu, Zhiwei Hao, Li Shen, Yong Luo, Fuhui Sun, Xiaoyan Wang, Han Hu, Yonggang Wen, 28 Aug 2025, CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference, https://arxiv.org/abs/2508.20375
Haozhe Tian, Qiyu Rao, Nina Moutonnet, Pietro Ferraro, Danilo Mandic, 29 Aug 2025, Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning, https://arxiv.org/abs/2508.21652
Kishor Datta Gupta, Md Manjurul Ahsan, Mohd Ariful Haque, Roy George, and Azmine Toushik Wasi, 31 Aug 2025, UrbanInsight: A Distributed Edge Computing Framework with LLM-Powered Data Filtering for Smart City Digital Twins, https://arxiv.org/abs/2509.00936
Hao Mark Chen, Zhiwen Mo, Guanxi Lu, Shuang Liang, Lingxiao Ma, Wayne Luk, Hongxiang Fan, 29 Aug 2025, Democratizing Agentic AI with Fast Test-Time Scaling on the Edge, https://arxiv.org/abs/2509.00195
Can Cui, Zilong Fu, Penghe Huang, Yuanyuan Li, Wu Deng, Dongyan Li, 30 Aug 2025, An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment, https://arxiv.org/abs/2509.00560
Andrea Fox, Francesco De Pellegrini, Eitan Altman, 1 Sep 2025, Multi-Agent Reinforcement Learning for Task Offloading in Wireless Edge Networks, https://arxiv.org/abs/2509.01257
Guilherme H. Apostolo, Pablo Bauszat, Vinod Nigade, Henri E. Bal, Lin Wang, 1 Sep 2025, Uirapuru: Timely Video Analytics for High-Resolution Steerable Cameras on Edge Devices, https://arxiv.org/abs/2509.01371
Einstein Rivas Pizarro, Wajiha Zaheer, Li Yang, Khalil El-Khatib, Glenn Harvel, 1 Sep 2025, Securing Radiation Detection Systems with an Efficient TinyML-Based IDS for Edge Devices, https://arxiv.org/abs/2509.01592
Evan King, Adam Sabra, Manjunath Kudlur, James Wang, Pete Warden, 2 Sep 2025, Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices, https://arxiv.org/abs/2509.02523
Mithun Goutham, Riccardo DalferroNucci, Stephanie Stockar, Meghna Menon, Sneha Nayak, Harshad Zade, Chetan Patel, Mario Santillo, 30 Aug 2025, Epsilon-Neighborhood Decision-Boundary Governed Estimation (EDGE) of 2D Black Box Classifier Functions, https://arxiv.org/abs/2504.09733
Usman Haider, Lukasz Szemet, Daniel Kelly, Vasileios Sergis, Andrew C. Daly, and Karl Mason, 8 Sep 2025, BioLite U-Net: Edge-Deployable Semantic Segmentation for In Situ Bioprinting Monitoring, https://arxiv.org/abs/2509.06690
Kasra Borazjani, Payam Abdisarabshali, Fardis Nadimi, Naji Khosravan, Minghui Liwang, Xianbin Wang, Yiguang Hong, Seyyedali Hosseinalipour, 5 Sep 2025, Multi-Modal Multi-Task (M3T) Federated Foundation Models for Embodied AI: Potentials and Challenges for Edge Integration, https://arxiv.org/abs/2505.11191
Yuxuan Bai, Yuxuan Sun, Tan Chen, Wei Chen, Sheng Zhou, Zhisheng Niu, 9 Sep 2025, FedTeddi: Temporal Drift and Divergence Aware Scheduling for Timely Federated Edge Learning, https://arxiv.org/abs/2509.07342
Mujie Liu, Chenze Wang, Liping Chen, Nguyen Linh Dan Le, Niharika Tewari, Ting Dang, Jiangang Ma, and Feng Xia, 11 Sep 2025, Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis, https://arxiv.org/abs/2509.09744
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat, 12 Sep 2025, Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge, https://arxiv.org/abs/2509.09955
Francisco Javier Esono Nkulu Andong and Qi Min, 12 Sep 2025, Federated Multi-Agent Reinforcement Learning for Privacy-Preserving and Energy-Aware Resource Management in 6G Edge Networks, https://arxiv.org/abs/2509.10163
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, 11 Sep 2025, Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication, https://arxiv.org/abs/2509.09168
Vishnu Narayanan Moothedath, Umang Agarwal, Umeshraja N, James Richard Gross, Jaya Prakash Champati, Sharayu Moharir, 19 Sep 2025, Inference Offloading for Cost-Sensitive Binary Classification at the Edge, https://arxiv.org/abs/2509.15674
Yiyi Liu, Chunyang Liu, Weiqin Jiao, Bojian Wu, Fashuai Li, Biao Xiong, 18 Sep 2025, CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan Reconstruction, https://arxiv.org/abs/2509.15459
Runjie Shao, Boyu Diao, Zijia An, Ruiqi Liu, Yongjun Xu, 19 Sep 2025, CBPNet: A Continual Backpropagation Prompt Network for Alleviating Plasticity Loss on Edge Devices, https://arxiv.org/abs/2509.15785
Rasil Baidar, Sasa Maric, Robert Abbas, 19 Sep 2025, Hybrid Deep Learning-Federated Learning Powered Intrusion Detection System for IoT/5G Advanced Edge Computing Network, https://arxiv.org/abs/2509.15555
Kushal Bose and Swagatam Das, 16 Sep 2025, Learning from Heterophilic Graphs: A Spectral Theory Perspective on the Impact of Self-Loops and Parallel Edges, https://arxiv.org/abs/2509.13139
Vijay Kumar Butte, Sujata Butte, 15 Sep 2025, An End to End Edge to Cloud Data and Analytics Strategy, https://arxiv.org/abs/2509.12296
Amir Taherin, Juyi Lin, Arash Akbari, Arman Akbari, Pu Zhao, Weiwei Chen, David Kaeli, Yanzhi Wang, 15 Sep 2025, Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs, https://arxiv.org/abs/2509.11480
Ocheme Anthony Ekle and William Eberle, 15 Sep 2025, Adaptive-GraphSketch: Real-Time Edge Anomaly Detection via Multi-Layer Tensor Sketching and Temporal Decay, https://arxiv.org/abs/2509.11633
Lennart Bamberg, Filippo Minnella, Roberto Bosio, Fabrizio Ottati, Yuebin Wang, Jongmin Lee, Luciano Lavagno, Adam Fuks, 17 Sep 2025, eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innovations, https://arxiv.org/abs/2509.14388
Tao Yang, Xuefeng Jiang, Wei Li, Peiyu Liu, Jinming Wang, Weijie Hao, Qiang Yang, 18 Sep 2025, Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks, https://arxiv.org/abs/2204.09942
Yuanchun Guo and Bingyan Liu and Yulong Sha and Zhensheng Xian, 4 Sep 2025, PracMHBench: Re-evaluating Model-Heterogeneous Federated Learning Based on Practical Edge Device Constraints, https://arxiv.org/abs/2509.08750
Chiara De Luca and Elisa Donati, 17 Sep 2025, Queen Detection in Beehives via Environmental Sensor Fusion for Low-Power Edge Computing, https://arxiv.org/abs/2509.14061
Alyssa Pinnock, Shakya Jayakody, Kawsher A Roxy, Md Rubel Ahmed, 17 Sep 2025, EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model, https://arxiv.org/abs/2506.09061

Hybrid Edge-Cloud Architectures

A hybrid architecture is where some processing is done on edge devices (e.g., PCs or security cameras), and some is passed up to the cloud for more powerful processing. The "Apple Intelligence" architecture is a prominent example now, with some processing done "on-device" for iPhone and Macs, and some passed up to the cloud.

Hasanul Mahmud, Peng Kang, Kevin Desai, Palden Lama, Sushil Prasad, 11 Mar 2024, A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge, https://arxiv.org/abs/2403.07036 (Hybrid cloud and on-device inference for image analysis.)
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, Lingjia Tang, 2017, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Comput. Archit. News, vol. 52, no. 4, pp. 615–629, https://dl.acm.org/doi/10.1145/3037697.3037698
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
Adarsh Prasad Behera, Paulius Daubaris, Iñaki Bravo, José Gallego, Roberto Morabito, Joerg Widmer, Jaya Prakash Varma Champati, 10 Jul 2024, Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical, https://arxiv.org/abs/2407.11061
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia, 19 Jun 2024, VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework, https://arxiv.org/abs/2406.13399
Teresa Peng, Kabir Mehta, Liam Liu, Aditya Nair, Priya Sing, 2024, Enhanced Hybrid Inference Techniques for Scalable On-Device LLMPersonalization and Cloud Integration, PDF: https://www.researchgate.net/profile/Priya-Singh-103/publication/384311522_Enhanced_Hybrid_Inference_Techniques_for_Scalable_On-Device_LLM_Personalization_and_Cloud_Integration/links/66f3cfb09e6e82486fef9f1c/Enhanced-Hybrid-Inference-Techniques-for-Scalable-On-Device-LLM-Personalization-and-Cloud-Integration.pdf
Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras, Haitham Bou Ammar, Jun Wang, 4 Oct 2024, Mixture of Attentions For Speculative Decoding, https://arxiv.org/abs/2410.03804
Divya Jyoti Bajpai, Manjesh Kumar Hanawal, 6 Oct 2024, Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach, https://arxiv.org/abs/2410.05338
Jiaming Qiu, Ruiqi Wang, Brooks Hu, Roch Guerin, Chenyang Lu, 24 Oct 2024, Optimizing Edge Offloading Decisions for Object Detection, https://arxiv.org/abs/2410.18919
Fan Yang, Zehao Wang∗, Haoyu Zhang, Zhenhua Zhu, Xinhao Yang, Guohao Dai, Yu Wang, Oct 2024, Efficient Deployment of Large Language Model across Cloud-Device Systems, https://nicsefc.ee.tsinghua.edu.cn/nics_file/pdf/f06a14c1-4d6d-441d-b4e4-82545ac5781b.pdf
Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
Divya Jyoti Bajpai, Manjesh Kumar Hanawal, 21 Dec 2024, Distributed Inference on Mobile Edge and Cloud: A Data-Cartography based Clustering Approach, https://arxiv.org/abs/2412.16616 https://anonymous.4open.science/r/DIMEC-1B04
You Zhou, Changsheng You, Kaibin Huang, 1 Jan 2025, Communication Efficient Cooperative Edge AI via Event-Triggered Computation Offloading, https://arxiv.org/abs/2501.02001
Huiyou Zhan, Xuan Zhang, Haisheng Tan, Han Tian, Dongping Yong, Junyang Zhang, Xiang-Yang Li, 16 Jan 2025, PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks, https://arxiv.org/abs/2501.09367 (Generate an outline in the cloud that is filled in by edge models, which is similar to Skeleton-of-Thought.)
X. Zheng, W. Zhang, C. Hu, L. Zhu and C. Zhang, "Cloud-Edge-End Collaborative Inference in Mobile Networks: Challenges and Solutions," in IEEE Network, doi: 10.1109/MNET.2025.3533581. https://ieeexplore.ieee.org/abstract/document/10852347
Sabri Eyuboglu, Dan Biderman, Avanika Narayan, Feb 24, 2025, Minions: the rise of small, on-device LMs: Embracing small LMs, shifting compute on-device, and cutting cloud costs in the process, https://hazyresearch.stanford.edu/blog/2025-02-24-minions
Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re, 21 Feb 2025, Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models, https://arxiv.org/abs/2502.15964 (Reading long documents using on-device small models, by breaking the document into small chunks processed by local LLMs, and only using the cloud LLMs for finalization tasks.)
M. Kim, P. Pinyoanuntapong, B. Kim, W. Saad and D. Calin, "Edge vs Cloud: How Do We Balance Cost, Latency, and Quality for Large Language Models Over 5G Networks?," 2025 IEEE Wireless Communications and Networking Conference (WCNC), Milan, Italy, 2025, pp. 1-6, doi: 10.1109/WCNC61545.2025.10978177, https://ieeexplore.ieee.org/abstract/document/10978177/
Apple, June 2025, Updates to Apple's On-Device and Server Foundation Language Models, https://machinelearning.apple.com/research/apple-foundation-models-2025-updates (Apple's 3B on-device model with cloud server alternative. The on-device architecture includes 2-bit quantization, 4-bit embeddings quantization, 8-bit KV quantization, a unique KV cache compression, interleaved local-global attention and multi-LoRA.)
Pengyan Zhu, Tingting Yang, 20 May 2025, CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration, https://arxiv.org/abs/2505.14085

Internet of Things (IoT)

IoT is an edge platform involving any low-resource devices on the internet. Research papers on LLMs and IoT devices includes:

L. Cheng, Y. Gu, Q. Liu, L. Yang, C. Liu and Y. Wang, 2024, Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A Survey, in IEEE Transactions on Sustainable Computing, doi: 10.1109/TSUSC.2024.3353176. https://ieeexplore.ieee.org/abstract/document/10398463
Rei Barjami, Antonio Miele, and Luca Mottola. 2024. Intermittent Inference: Trading a 1% Accuracy Loss for a 1.9x Throughput Speedup. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems (SenSys '24). Association for Computing Machinery, New York, NY, USA, 647–660. https://doi.org/10.1145/3666025.3699364 https://dl.acm.org/doi/abs/10.1145/3666025.3699364 https://dl.acm.org/doi/pdf/10.1145/3666025.3699364
Ye Cheng, Minghui Xu, Yue Zhang, Kun Li, Ruoxi Wang, Lian Yang, 16 Nov 2024, AutoIoT: Automated IoT Platform Using Large Language Models, https://arxiv.org/abs/2411.10665
Ibrahim Kok, Orhan Demirci, Suat Ozdemir, 20 Nov 2024, When IoT Meet LLMs: Applications and Challenges, https://arxiv.org/abs/2411.17722
A. K. Al-Zihairy and A. E. Abdelkareem, "Optimizing YOLOv8-cls: A Step Towards Smarter Edge Environments," 2024 1st International Conference on Emerging Technologies for Dependable Internet of Things (ICETI), Sana'a, Yemen, 2024, pp. 1-6, doi: 10.1109/ICETI63946.2024.10777236. https://ieeexplore.ieee.org/abstract/document/10777236
Shubham Vaishnav, Praveen Kumar Donta, Sindri Magn\'usson, 13 Aug 2025, Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints, https://arxiv.org/abs/2505.02640
Amod Kant Agrawal, 23 Jul 2025, Our Cars Can Talk: How IoT Brings AI to Vehicles, https://arxiv.org/abs/2507.17214
Harsha Sammangi (Dakota State University), Aditya Jagatha (College of Business and Information Systems, Dakota State University), Giridhar Reddy Bojja (College of Business, Michigan Technological University), Jun Liu (College of Business and I.S, Dakota State University), 29 Apr 2025, Decentralized AI-driven IoT Architecture for Privacy-Preserving and Latency-Optimized Healthcare in Pandemic and Critical Care Scenarios, https://arxiv.org/abs/2507.15859
Zied Jenhani and Mounir Bensalem and Jasenka Dizdarevi\'c and Admela Jukan, 22 Jul 2025, An Experimental Study of Split-Learning TinyML on Ultra-Low-Power Edge/IoT Nodes, https://arxiv.org/abs/2507.16594
Ahmad Alhonainy (1), Praveen Rao (1) ((1) University of Missouri, USA), 19 Jul 2025, Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments, https://arxiv.org/abs/2507.17772
Aiman Faiz, Anna Maria De Roberto, Claudio Pascarelli, Gianvito Mitrano, Gianluca Fimiani, Marina Garofano, Genoveffa Tortora, Mariangela Lazoi, Claudio Passino, Alessia Bramanti, 24 Jul 2025, Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification, https://arxiv.org/abs/2505.09619
Mohammad Mehdi Rastikerdar, Jin Huang, Hui Guan, Deepak Ganesan, 11 Aug 2025, In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation, https://arxiv.org/abs/2409.07796
Arianna Stropeni, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Marco Fabris, Gian Antonio Susto, 28 Jul 2025, Towards Scalable IoT Deployment for Visual Anomaly Detection via Efficient Compression, https://arxiv.org/abs/2505.07119
Ze Zhang and Qian Dong and Wenhan Wang, 30 Jul 2025, AdapSCA-PSO: An Adaptive Localization Algorithm with AI-Based Hybrid SCA-PSO for IoT WSNs, https://arxiv.org/abs/2507.22317
Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu, 30 Jul 2025, NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT, https://arxiv.org/abs/2404.08939
Dulana Rupanetti, Naima Kaabouch, 3 Aug 2025, Leveraging Machine Learning for Botnet Attack Detection in Edge-Computing Assisted IoT Networks, https://arxiv.org/abs/2508.01542
Natalia Emelianova, Carlos Kamienski and Ronaldo C. Prati, 7 Aug 2025, Optimizing IoT Threat Detection with Kolmogorov-Arnold Networks (KANs), https://arxiv.org/abs/2508.05591
Muhammad Sakib Khan Inan, Kewen Liao, 13 Aug 2025, DeepFeatIoT: Unifying Deep Learned, Randomized, and LLM Features for Enhanced IoT Time Series Sensor Data Classification in Smart Industries, https://arxiv.org/abs/2508.09468
Jesus Oma\~na Iglesias, Carlos Segura Perales, Stefan Gei{\ss}ler, Diego Perino, Andra Lutu, 13 Aug 2025, Anomaly Detection for IoT Global Connectivity, https://arxiv.org/abs/2508.09660
Afrah Gueriani, Hamza Kheddar, Ahmed Cherif Mazari and Mohamed Chahine Ghanem, 17 Aug 2025, A Robust Cross-Domain IDS using BiGRU-LSTM-Attention for Medical and Industrial IoT Security, https://arxiv.org/abs/2508.12470
Prabath Abeysekara, Hai Dong, 18 Aug 2025, Data-driven Trust Bootstrapping for Mobile Edge Computing-based Industrial IoT Services, https://arxiv.org/abs/2508.12560
Hui Wei, Dong Yoon Lee, Shubham Rohal, Zhizhang Hu, Ryan Rossi, Shiwei Fang, Shijia Pan, 21 Aug 2025, A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis, https://arxiv.org/abs/2506.12263
Sunwoo Kim, 17 Aug 2025, Deep Learning and Matrix Completion-aided IoT Network Localization in the Outlier Scenarios, https://arxiv.org/abs/2508.18225
Pavle Vasiljevic, Milica Matic, Miroslav Popovic, 4 Sep 2025, Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems, https://arxiv.org/abs/2506.05138
Aohan Li and Miyu Tsuzuki, 26 Aug 2025, (DEMO) Deep Reinforcement Learning Based Resource Allocation in Distributed IoT Systems, https://arxiv.org/abs/2508.19318
Tongxi Wu, Chenwei Xu, Jin Yang, 22 Aug 2025, MixGAN: A Hybrid Semi-Supervised and Generative Approach for DDoS Detection in Cloud-Integrated IoT Networks, https://arxiv.org/abs/2508.19273
Imran S. A. Khan, Emmanuel G. Blanchard, S\'ebastien George, 29 Aug 2025, Harnessing IoT and Generative AI for Weather-Adaptive Learning in Climate Resilience Education, https://arxiv.org/abs/2508.21666
Bhima Sankar Manthina (1), Shreyash Gujar (1), Sachin Chaudhari (1), Kavita Vemuri1 (1) and Shivam Chhirolya (2) ((1) International Institute of Information Technology-Hyderabad (IIIT-H), India, (2) Prezent.AI, India), 31 Aug 2025, IoT-based Noise Monitoring using Mobile Nodes for Smart Cities, https://arxiv.org/abs/2509.00979
Guanjie Cheng, Boyi Li, Peihan Wu, Feiyi Chen, Xinkui Zhao, Mengying Zhu, Shuiguang Deng, 8 Sep 2025, DyC-STG: Dynamic Causal Spatio-Temporal Graph Network for Real-time Data Credibility Analysis in IoT, https://arxiv.org/abs/2509.06483
Rasil Baidar, Sasa Maric, Robert Abbas, 19 Sep 2025, Hybrid Deep Learning-Federated Learning Powered Intrusion Detection System for IoT/5G Advanced Edge Computing Network, https://arxiv.org/abs/2509.15555
Sergio Benlloch-Lopez, Miquel Viel-Vazquez, Javier Naranjo-Alcazar, Jordi Grau-Haro and Pedro Zuccarello, 19 Sep 2025, Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework, https://arxiv.org/abs/2509.14657
Mohammadreza Narimani, Ali Hajiahmad, Ali Moghimi, Reza Alimardani, Shahin Rafiee, and Amir Hossein Mirzabe, 14 Sep 2025, Developing an aeroponic smart experimental greenhouse for controlling irrigation and plant disease detection using deep learning and IoT, https://arxiv.org/abs/2509.12274
Wilfrid Sougrinoma Compaor\'e, Yaya Etiabi, El Mehdi Amhoud, Mohamad Assaad, 16 Sep 2025, Energy-Efficient Quantized Federated Learning for Resource-constrained IoT devices, https://arxiv.org/abs/2509.12814