Aussie AI
Open Source Models
-
Last Updated 29 August, 2025
-
by David Spuler, Ph.D.
There are many different AI models that have been open-sourced. In many cases, both the code for the inference algorithm and the model's weights are available. Some licenses have only minimal restrictions (e.g. MIT License, Apache License 2.0), whereas other model licenses restrict usage to research or non-commercial activities.
Research Papers on Open Source Models
- Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample, Meta AI, Feb 2023, LLaMA: Open and Efficient Foundation Language Models, https://arxiv.org/abs/2302.13971 (Meta's Llama version 1, research-licensed, not fully open-sourced.)
- Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom, Meta AI, July 2023, Llama 2: Open Foundation and Fine-Tuned Chat Models, https://arxiv.org/abs/2307.09288 (LLama version 2, open-sourced including commercial, with a non-standard model-specific license.)
- MosaicML NLP Team, "Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs", May 2023, Mosaic ML Blog, https://www.mosaicml.com/blog/mpt-7b
- Georgi Gerganov, Jun, 2023 Llama.cpp project, https://github.com/ggerganov/llama.cpp/
- Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme, "Falcon-40B: an open large language model with state-of-the-art performance", 2023, Hugging Face repository. https://huggingface.co/tiiuae/falcon-40b
- Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay, "The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only", June 2023, arXiv article https://arxiv.org/abs/2306.01116
- Tasmia Ansari, UC Berkeley Releases Open LLaMA, an Open-Source Alternative to Meta’s LLaMA, May 2023, Analytics India Magazine https://analyticsindiamag.com/uc-berkeley-release-an-open-source-alternative-to-metas-llama/
- Together Computer, "OpenChatKit: An Open Toolkit and Base Model for Dialogue-style Applications", March 2023, GitHub repository https://github.com/togethercomputer/OpenChatKit
- BigScience, "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model", June 2023, arXiv paper 2211.05100 https://arxiv.org/pdf/2211.05100.pdf
- Nolan Dey, Gurpreet Gosal, Zhiming (Charles) Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness, "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster", April 2023, arXiv 2304.03208 https://arxiv.org/abs/2304.03208
- Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica, "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena", 2023, ArXiv paper 2306.05685 https://arxiv.org/abs/2306.05685
- Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, and Ion Stoica. June 20th, 2023, vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, https://arxiv.org/pdf/2309.06180.pdf
- Jeon, Byungsoo, May 2024, Automated and Portable Machine Learning Systems, Ph.D. Thesis, Carnegie Mellon University, https://doi.org/10.1184/R1/25746708.v1 https://kilthub.cmu.edu/articles/thesis/Automated_and_Portable_Machine_Learning_Systems/25746708/1 PDF: https://kilthub.cmu.edu/ndownloader/files/46074087 Code: https://github.com/cmu-catalyst/collage (Portability layer to integrate the various kernels and low-level backends more easily. Also covers pipeline parallelism in graph models, and KV cache parallelism similar to FlashDecode.)
- Maria Korolov, 15 May 2024, 10 things to watch out for with open source gen AI, CIO, https://www.cio.com/article/2104280/10-things-to-watch-out-for-with-open-source-gen-ai.html
- JH Jones, May 2024, A Quantitative Comparison of Pre-Trained Model Registries to Traditional Software Package Registries, Masters Thesis, Electrical and Computer Engineering, Purdue University, https://hammer.purdue.edu/articles/thesis/A_Quantitative_Comparison_of_Pre-Trained_Model_Registries_to_Traditional_Software_Package_Registries/25686447/1 PDF: https://hammer.purdue.edu/ndownloader/files/46096152
- Tomasz Tunguz, Apr 24, 2024, A Shift in LLM Marketing : The Rise of the B2B Model, https://tomtunguz.com/snowflake-arctic-model/
- Nathan Lambert, APR 18, 2024, Llama 3: Scaling open LLMs to AGI, https://www.interconnects.ai/p/llama-3-and-scaling-open-llms
- John Loeffler, April 19, 2024, Meta rolls out new Meta AI website, and it might just bury Microsoft and Google's AI dreams, Tech Radar, https://www.techradar.com/computing/meta-rolls-out-new-meta-ai-website-and-it-might-just-bury-microsoft-and-googles-ai-dreams
- Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, and Bill Howe. 2024. Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings. In ACMConference on Fairness, Accountability, and Transparency (ACM FAccT ’24), June 3–6, 2024, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3630106.3658966 https://arxiv.org/pdf/2405.16820
- Michael Nuñez, February 6, 2024, Meet ‘Smaug-72B’: The new king of open-source AI, Venture Beat, https://venturebeat.com/ai/meet-smaug-72b-the-new-king-of-open-source-ai/
- Sharon Machlis, March 28, 2024, 5 easy ways to run an LLM locally, InfoWorld, https://www.infoworld.com/article/3705035/5-easy-ways-to-run-an-llm-locally.html
- Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, Guilherme Penedo, 29 Nov 2023, The Falcon Series of Open Language Models, https://arxiv.org/abs/2311.16867
- Ankit Patel, June 14, 2024, NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models, https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
- David Spuler, March 2024, Chapter 5. Design Choices & Architectures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Intel, Apr 25, 2024, Deployment of Llama3 on Your AI PC with OpenVINO™, https://medium.com/openvino-toolkit/deployment-of-llama3-on-your-ai-pc-with-openvino-b58e961501d6
- Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit Niyato, Mohsen Guizani, 18 Jun 2024 (v2), Efficient Prompting for LLM-based Generative Internet of Things, https://arxiv.org/abs/2406.10382
- Elizabeth Gibney, 19 June 2024, Not all ‘open source’ AI models are actually open: here’s a ranking, Nature, https://www.nature.com/articles/d41586-024-02012-5
- Liesenfeld, A., Dingemanse, M., 2024, Rethinking open source generative AI: open washing and the EU AI Act, In FAccT '24: Proc. 2024 ACM Conf. on Fairness, Accountability, and Transparency 1774–1787 (ACM, 2024). https://dl.acm.org/doi/10.1145/3630106.3659005
- William Gallagher, Jun 19, 2024, Apple researchers add 20 more open-source models to improve text and image AI, https://appleinsider.com/articles/24/06/19/apple-researchers-add-20-more-open-source-models-to-improve-text-and-image-ai
- Piotr Skalski, June 20, 2024, Florence-2: Open Source Vision Foundation Model by Microsoft, https://blog.roboflow.com/florence-2/
- Waleed Kadous, August 23, 2023, Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper, https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper Code: https://github.com/anyscale/factuality-eval
- Ben Wodecki, November 16, 2023, Generative AI Projects More Than Triple on GitHub in 2023, https://aibusiness.com/nlp/gen-ai-projects-soar-more-than-triple-on-github
- Valentina Alto, 2024, Chapter 3: Choosing an LLM for Your Application, Building LLM-Powered Applications: Create intelligence apps and agents with large language models, Packt Publishing, https://www.amazon.com/Building-LLM-Apps-Intelligent-Language/dp/1835462316/
- Clement Farabet, Tris Warkentin, Jun 27, 2024 Gemma 2 is now available to researchers and developers, https://blog.google/technology/developers/google-gemma-2/
- Meta, July 23, 2024, Introducing Llama 3.1: Our most capable models to date, https://ai.meta.com/blog/meta-llama-3-1/
- Mark Zuckerberg, July 23, 2024 Open Source AI Is the Path Forward https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/
- Vince Lam, Mar 12, 2024, 50+ Open-Source Options for Running LLMs Locally, https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f
- Michael Nuñez, July 18, 2024, Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling, https://venturebeat.com/ai/groq-open-source-llama-ai-model-tops-leaderboard-outperforming-gpt-4o-and-claude-in-function-calling/
- Washington Post, 2024, Meta releases open-source AI model it says rivals OpenAI, Google tech, https://www.washingtonpost.com/technology/2024/07/23/meta-new-ai-llama-open/
- AIM, 2024, Mistral AI Unveils Mistral Large 2, Beats Llama 3.1 on Code and Math, https://analyticsindiamag.com/ai-news-updates/mistral-ai-unveils-mistral-large-2-beats-llama-3-1-on-code-and-math/
- David Linthicum, Aug 02, 2024, Small language models and open source are transforming AI, https://www.infoworld.com/article/3480593/small-language-models-and-open-source-are-transforming-ai.html
- Level Up Coding, Aug 2024, Google open-sources the most powerful small model on the edge: 2B parameters surpass GPT-3.5-Turbo, and Apple 15Pro runs fast, https://levelup.gitconnected.com/google-open-sources-the-most-powerful-small-model-on-the-edge-2b-parameters-surpass-gpt-3-5-turbo-c0b13f96997c
- Michael Nuñez, August 26, 2024, Aleph Alpha unveils EU-compliant AI: A new era for transparent machine learning, https://venturebeat.com/ai/aleph-alpha-unveils-eu-compliant-ai-a-new-era-for-transparent-machine-learning/
- Shubham Sharma, August 29, 2024, Meta leads open-source AI boom, Llama downloads surge 10x year-over-year, https://venturebeat.com/ai/meta-leads-open-source-ai-boom-llama-downloads-surge-10x-year-over-year/
- Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars, 16 Apr 2024 (v3), Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production, https://arxiv.org/abs/2312.14972
- Shrestha, Y.R., von Krogh, G. & Feuerriegel, S., 2023, Building open-source AI. Nat Comput Sci 3, 908–911 (2023). https://doi.org/10.1038/s43588-023-00540-0 https://www.nature.com/articles/s43588-023-00540-0
- Abhinand, Aug 20, 2024, Self-Hosting LLaMA 3.1 70B (or any ~70B LLM) Affordably, https://abhinand05.medium.com/self-hosting-llama-3-1-70b-or-any-70b-llm-affordably-2bd323d72f8d
- David Spuler, March 2024, Open Source Models, in Generative AI in C++, https://www.aussieai.com/book/ch5-open-source-models
- Carl Franzen, September 5, 2024, Meet the new, most powerful open source AI model in the world: HyperWrite’s Reflection 70B, https://venturebeat.com/ai/meet-the-new-most-powerful-open-source-ai-model-in-the-world-hyperwrites-reflection-70b/
- Asif Razzaq, September 5, 2024, Yi-Coder Released by 01.AI: A Powerful Small-Scale Code LLM Series, Delivering Exceptional Performance in Code Generation, Editing, and Long-Context Comprehension, https://www.marktechpost.com/2024/09/05/yi-coder-released-by-01-ai-a-powerful-small-scale-code-llm-series-delivering-exceptional-performance-in-code-generation-editing-and-long-context-comprehension/
- Michael Nuñez, September 16, 2024, SambaNova challenges OpenAI’s o1 model with Llama 3.1-powered demo on HuggingFace, https://venturebeat.com/ai/sambanova-challenges-openais-o1-model-with-llama-3-1-powered-demo-on-huggingface/
- Meta, August 29, 2024, With 10x growth since 2023, Llama is the leading engine of AI innovation https://ai.meta.com/blog/llama-usage-doubled-may-through-july-2024/
- Michael Nuñez, October 1, 2024, Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4, https://venturebeat.com/ai/nvidia-just-dropped-a-bombshell-its-new-ai-model-is-open-massive-and-ready-to-rival-gpt-4/
- Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping, 17 Sep 2024, NVLM: Open Frontier-Class Multimodal LLMs, NVIDIA, https://arxiv.org/abs/2409.11402 https://huggingface.co/nvidia/NVLM-D-72B https://nvlm-project.github.io/
- Sean Michael Kerner, October 20, 2024, IBM debuts open source Granite 3.0 LLMs for enterprise AI, https://venturebeat.com/ai/ibm-debuts-open-source-granite-3-0-llms-for-enterprise-ai/
- Meta, October 18, 2024, Sharing new research, models, and datasets from Meta FAIR, https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua/
- Matt Marshall, October 24, 2024, The enterprise verdict on AI models: Why open source will win, https://venturebeat.com/ai/the-enterprise-verdict-on-ai-models-why-open-source-will-win/
- Meta, October 24, 2024, Introducing quantized Llama models with increased speed and a reduced memory footprint, https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/
- Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, (and many more authors), 4 Nov 2024, Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, https://arxiv.org/abs/2411.02265 https://github.com/Tencent/Hunyuan-Large https://huggingface.co/tencent/Tencent-Hunyuan-Large
- Robert Corwin Nov 2024, Running Large Language Models Privately: A comparison of frameworks, models, and costs, https://towardsdatascience.com/running-large-language-models-privately-a-comparison-of-frameworks-models-and-costs-ac33cfe3a462
- Carl Franzen, October 31, 2024, Meta makes its MobileLLM open for researchers, posting full weights, https://venturebeat.com/ai/meta-makes-its-mobilellm-open-for-researchers-posting-full-weights/
- Jason Perlow, Nov. 6, 2024, The best open-source AI models: All your free-to-use options explained: Here are the best open-source and free-to-use AI models for text, images, and audio, organized by type, application, and licensing considerations. https://www.zdnet.com/article/the-best-open-source-ai-models-all-your-free-to-use-options-explained/
- Chris Wellons, November 10, 2024, Everything I've learned so far about running local LLMs, https://nullprogram.com/blog/2024/11/10/
- Tegan Jones, 6 November, 2024, Open source AI: What it is and why it matters for business. We now have a definition for ‘open source AI’ and that’s important for business owners, especially when big tech doesn’t adhere to it. https://www.smartcompany.com.au/artificial-intelligence/open-source-ai-what-it-is-and-why-it-matters-for-business/
- Qwen Team, November 28, 2024, QwQ: Reflect Deeply on the Boundaries of the Unknown, https://qwenlm.github.io/blog/qwq-32b-preview/
- Ai2, November 26, 2024, OLMo 2: The best fully open language model to date, https://allenai.org/blog/olmo2
- Kyle Wiggers, December 6, 2024, Meta unveils a new, more efficient Llama model, https://techcrunch.com/2024/12/06/meta-unveils-a-new-more-efficient-llama-model/
- Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
- Ben Dickson, December 10, 2024, OpenAI’s o1 model doesn’t show its thinking, giving open source an advantage, https://venturebeat.com/ai/heres-how-openai-o1-might-lose-ground-to-open-source-models/
- Inkit Padhi, Manish Nagireddy, Giandomenico Cornacchia, Subhajit Chaudhury, Tejaswini Pedapati, Pierre Dognin, Keerthiram Murugesan, Erik Miehling, Martín Santillán Cooper, Kieran Fraser, Giulio Zizzo, Muhammad Zaid Hameed, Mark Purcell, Michael Desmond, Qian Pan, Inge Vejsbjerg, Elizabeth M. Daly, Michael Hind, Werner Geyer, Ambrish Rawat, Kush R. Varshney, Prasanna Sattigeri, 10 Dec 2024, Granite Guardian, https://arxiv.org/abs/2412.07724 https://github.com/ibm-granite/granite-guardian (Open-sourcing of safety models with many capabilities.)
- Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V. Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi, 31 Dec 2024, 2 OLMo 2 Furious, https://arxiv.org/abs/2501.00656
- NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
- Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
- Charles Rollet, January 29, 2025, Zuck shrugs off DeepSeek, vows to spend hundreds of billions on AI, https://techcrunch.com/2025/01/29/zuck-shrugs-off-deepseek-vows-to-spend-hundreds-of-billions-on-ai/
- Ryan Browne, Feb 4 2025, DeepSeek’s breakthrough emboldens open-source AI models like Meta’s Llama, https://www.cnbc.com/2025/02/04/deepseek-breakthrough-emboldens-open-source-ai-models-like-meta-llama.html
- Maxwell Zeff, February 5, 2025, Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50, https://techcrunch.com/2025/02/05/researchers-created-an-open-rival-to-openais-o1-reasoning-model-for-under-50/
- Kyle Wiggers, January 11, 2025, Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450,https://techcrunch.com/2025/01/11/researchers-open-source-sky-t1-a-reasoning-ai-model-that-can-be-trained-for-less-than-450/
- R Szilágyi, 2024, OpenSource alternatives of Generative Artifical Intelligence for SME's, Journal of Agricultural Informatics, Vol. 15 No. 2 (2024), https://doi.org/10.17700/jai.2024.15.2.733 https://journal.magisz.org/index.php/jai/article/view/733 https://journal.magisz.org/index.php/jai/article/view/733/412
- Kyle Wiggers, February 21, 2025, DeepSeek to open source parts of online services code, https://techcrunch.com/2025/02/21/deepseek-to-open-source-parts-of-online-services-code/
- kinfey, Feb 27, 2025, Welcome to the new Phi-4 models - Microsoft Phi-4-mini & Phi-4-multimodal, https://techcommunity.microsoft.com/blog/educatordeveloperblog/welcome-to-the-new-phi-4-models---microsoft-phi-4-mini--phi-4-multimodal/4386037
- Asif Razzaq, March 5, 2025, Qwen Releases QwQ-32B: A 32B Reasoning Model that Achieves Significantly Enhanced Performance in Downstream Task, https://www.marktechpost.com/2025/03/05/qwen-releases-qwq-32b-a-32b-reasoning-model-that-achieves-significantly-enhanced-performance-in-downstream-task/ (Features 32B parameters, 32K context length, 64 layers, RoPE, SwiGLU, RMSNorm, and attention enhancements.)
- Nathan Lambert, Mar 14, 2025, Gemma 3, OLMo 2 32B, and the growing potential of open-source AI: Leading open-weight models and the first open-source model to clearly surpass GPT 3.5 (the very last version), https://www.interconnects.ai/p/gemma-3-olmo-2-32b-and-the-growing
- Annika Kim Constantino, Apr 5 2025, Meta debuts new Llama 4 models, but most powerful AI model is still to come https://www.cnbc.com/2025/04/05/meta-debuts-new-llama-4-models-but-most-powerful-ai-model-is-still-to-come.html
- Devansh, Jun 1, 2025, The Costly Open-Source LLM Lie: Open Source LLMs are not Free, https://machine-learning-made-simple.medium.com/the-costly-open-source-llm-lie-f83fdc5d5701
- Nathan Lambert, Jul 04, 2025, The American DeepSeek Project: What I think the next goal for the open-source AI community is, https://www.interconnects.ai/p/the-american-deepseek-project
- Jim Clyde Monge, Mar 18, 2024, xAI Releases Grok-1 — The Biggest Open-Source LLM, https://generativeai.pub/xai-releases-grok-1-the-biggest-open-source-llm-28fe8ab84575
- Shubham Sharma, December 17, 2024, UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models, https://venturebeat.com/ai/uaes-falcon-3-challenges-open-source-leaders-amid-surging-demand-for-small-ai-models/
- Gemma Team, Google DeepMind, 12 March 2025, Gemma 3Technical Report, https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
- Mistral AI, Mar 17, 2025, Mistral Small 3.1: SOTA. Multimodal. Multilingual. Apache 2.0, https://mistral.ai/news/mistral-small-3-1
- Michael Nuñez, March 17, 2025, Mistral AI drops new open-source model that outperforms GPT-4o Mini with fraction of parameters, https://venturebeat.com/ai/mistral-ai-drops-new-open-source-model-that-outperforms-gpt-4o-mini-with-fraction-of-parameters/
- Carl Franzen, March 17, 2025, Baidu delivers new LLMs ERNIE 4.5 and ERNIE X1 undercutting DeepSeek, OpenAI on cost — but they’re not open source (yet), https://venturebeat.com/ai/baidu-delivers-new-llms-ernie-4-5-and-ernie-x1-undercutting-deepseek-openai-on-cost-but-theyre-not-open-source-yet/
- MiniMax: Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, (and many more authors), 16 Jun 2025, MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention, https://arxiv.org/abs/2506.13585 https://github.com/MiniMax-AI/MiniMax-M1 (A 456B MoE reasoning model trained with RL and has various optimizations in training efficiency and attention kernel.)
- Michael Nuñez, July 11, 2025, Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free, https://venturebeat.com/ai/moonshot-ais-kimi-k2-outperforms-gpt-4-in-key-benchmarks-and-its-free/ (One trillion parameters with 32B experts activated each time. Examines new training optimizer MuonClip as more efficient and more stable than variants of AdamW for training.)
- Michael Nuñez, August 14, 2025, That ‘cheap’ open-source AI model is actually burning through your compute budget, https://venturebeat.com/ai/that-cheap-open-source-ai-model-is-actually-burning-through-your-compute-budget/ (Open-source models use more tokens.)
- Tim, Nous Research, Aug 14, 2025, Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark, https://nousresearch.com/measuring-thinking-efficiency-in-reasoning-models-the-missing-benchmark/
- Kaitao Chen, Mianxin Liu, Daoming Zong, Chaoyue Ding, Shaohao Rui, Yankai Jiang, Mu Zhou, Xiaosong Wang, 8 Aug 2025, Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making, https://arxiv.org/abs/2508.05996
- Mithun Saha, Maxwell A. Xu, Wanting Mao, Sameer Neupane, James M. Rehg, Santosh Kumar, 23 Jul 2025, Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings, https://arxiv.org/abs/2502.01108
- Eleftherios Tzanis and Michail E. Klontzas, 11 Aug 2025, mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging, https://arxiv.org/abs/2505.03785
- Zihao Chen, Ji Zhuang, Jinyi Shen, Xiaoyue Ke, Xinyi Yang, Mingjie Zhou, Zhuoyao Du, Xu Yan, Zhouyang Wu, Zhenyu Xu, Jiangli Huang, Li Shang, Xuan Zeng, Fan Yang, 14 Aug 2025, AnalogSeeker: An Open-source Foundation Language Model for Analog Circuit Design, https://arxiv.org/abs/2508.10409
- Vamsi Krishna Mulukutla, Sai Supriya Pavarala, Srinivasa Raju Rudraraju, Sridevi Bonthu, 19 Aug 2025, Evaluating Open-Source Vision Language Models for Facial Emotion Recognition against Traditional Deep Learning Models, https://arxiv.org/abs/2508.13524
- Anindya Bijoy Das, Shibbir Ahmed and Shahnewaz Karim Sakib, 19 Aug 2025, Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models, https://arxiv.org/abs/2504.19061
- Ephraiem Sarabamoun, 12 Aug 2025, Special-Character Adversarial Attacks on Open-Source Language Model, https://arxiv.org/abs/2508.14070
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: