Aussie AI
Chapter 19. AI Phones and PCs
-
Book Excerpt from "Generative AI Applications: Planning, Design and Implementation"
-
by David Spuler
Chapter 19. AI Phones and PCs
AI Phones
Having an AI on your phone seems a sure bet as the next big thing. The main idea is to have a voice interface whereby you can have a conversation with the LLM. We’ve had this type of voice interaction on our phones for a while, but it’s always been hit-and-miss in terms of its usefulness.
Imagine if it was actually smart!
Certainly, the big phone vendors are betting that this can be done, and consumers will buy new phones in droves. Samsung got there first with some AI features on its phones in 2023, and the “Samsung Galaxy AI” in early 2024. Google followed up a little later with its “on-device” SDK for building AI apps on Android phones. Apple is lagging, but seems to have come on strong with its “Apple Intelligence” platform, although that’s just been released as I write this. Apple CEO Tim Cook says, “Not first, but best.” It remains to be seen!
AI PCs
I’m quite sure you’ve seen plenty of ads for these new types of PCs. They have more powerful processors added next to the CPU, which are good for AI, but not as hungry as a full GPU. And, of course, if you have a high-end PC with a good GPU, then you can run AI as well.
Microsoft has been pushing all of the Copilot+ PCs, with builtin AI capabilities. There are also numerous software “copilot” capabilities in many of the Microsoft tools. In fact, the announcements are almost weekly, and surely they’ve added AI to all of these PC apps by now!
There’s a lot of overlap between running AI on a phone or a PC, since both a lower-powered devices than servers in the cloud. However, they have a difficult focus, and the special features of AI PCs that make them separate from AI phones include:
- Bigger screens
- Keyboard and mouse interface
- Don’t fit in your pocket
- Better for work tasks
- Faster CPUs
- High-end PCs can have GPUs
Overall, although home PCs are common, the AI PCs are much more business-focused for all the above reasons. It’s hard to write a business report using a voice interface on a tiny screen, so I don’t see phones replacing work PCs any time soon.
Use Cases
What use cases are best for AI phones and PCs? The simplest type of AI engines are “read-only” in the sense that they don’t change anything or do any “actions.” These types of use cases are where an LLM processes input data, but only gives you an answer or a summary report.
What data can an LLM analyze on your phone or PC? Think about what information is available for analysis. An AI engine on the phone could search or access:
- Text messages
- Emails
- Socials
- Photos
- Voice memos (audio recordings)
- Videos
- Contacts
- Time and date
- Web page (currently on your browser)
- Browser web page history
- Calendar
- Timers
- Torch/light
- To-do lists
- Notes
- Orientation/altimeter/motion
- Health data
- Files (less commonly used)
- Settings (e.g., DND, airplane mode, etc.)
Some other miscellaneous ideas:
- Remember my passwords
- Activity history
- Track my steps
- Detect a car accident or a fall
- Remind me to take my pills
We can also generate things using AI, which is like a “read” action, because it doesn’t change the external world (or even your phone to start with), until you store it or send it, such as:
- Write text (outline, brainstorm, lists, drafts, poems, song lyrics, etc.)
- Revise, edit, proofread text
- Create emails, texts, social posts, etc.
- Create image
- Create video
- Create audio (e.g., a song)
- Create animation
- Create emoji
And if we generalize that to “read” things further afield, we get viewing actions such as:
- Weather
- News headlines
- Surf the web
- Stock price quotes
- Sales and discounts
- Foreign exchange rates
- Watch an online video
- Remind you of something
- Search the internet
- Maps
- Exercise routes
- Locate my car
- Locate my child, parent, spouse, dog, cat, phone, glasses, keys, etc.
Agents on Devices
AI applications that perform “actions” are called “agents.” There’s a good case for having a smart agent to assist you, awaiting your command inside your pocket or purse, or on the desk at work. Agents can also run automatically without you needing to start them off, such as having an LLM wake you up in the morning with a nice poem.
What can an AI change or do on your phone or PC? There’s a long list, and here are some:
- Send a text or email
- Post a social
- Call someone (wouldn’t you rather text?)
- Call someone without needing you, and the AI does the talking!
- Take a photo or video (via the camera)
- Record an audio (via the microphone)
- Set a timer (or update/cancel)
- Play a sound
- Speak (e.g., answer a question to you)
- Store a file (locally, or in the cloud)
- Edit words (texts, emails, documents)
- Edit audio
- Edit photos
- Edit videos
And if we allow the AI agent to look beyond the SSD drive of your phone or PC, via other apps that seek out further information, or make remote actions, we get more:
- Book a ... meeting, appointment, hair salon, restaurant, flight, holiday, doctor, vaccination, rental car, test drive, cruise, babysitter, tutor, movie, and on and on.
- Update/modify/cancel/reminders viz a booking for all of the above.
- Order a ... ride-share, taxi, meal, book, product, etc.
- Track/update/modify/cancel an order for all of the above.
Advanced Multi-Step Use Cases
There are some very complex use cases that can involve lengthy tasks that may require a back-and-forth between the LLM and humans, or sourcing data from many different places. The advanced ideas include:
- Book a vacation
- Do my tax return
- Sort out my travel expenses
- Tutor me in physics
- What does my day look like?
- Pay my bills
- What do my kids need today for school?
- Prioritize my to-do list
- Does my car need a service?
- Teach me a foreign language
- Help me manage my child’s diabetes
- Invoice my clients
- Update my bookkeeping
- Monitor the news headlines for me
And going beyond that, imagine these advancements in combination with talking to your phone or laptop...
- 5G download speed (6G?)
- Robots (to mop the floor)
- Autonomous cars
- AI gadgets and new form factors
- Industrial automation
- Quantum computing
AI Phone Apps
There’s an obvious opportunity to add AI functionality to phone apps. We’ve already seen Microsoft quick out of the gate in adding AI functionality to numerous software products in their portfolio, some of which relates to accessing AI engines from your PC or phone. As I write this, Apple is hurrying to get all of its third-party developers to add “intents” to their iPhone apps, so that Siri and other Apple Intelligence models can do more things via apps.
The first steps have been AI functionality in the core apps from the vendors, including Samsung, Google, and Apple, which I’ve put in the order that they released them. Samsung was the first to offer Galaxy AI phones, then Google announced Pixel 8 Pro, powered by the Gemini Nano model, and then last to arrive was Apple Intelligence with multiple small models based on a multi-LoRA architecture.
So far, consumer reactions have been...patchy.
We’re early in the cycle, but the desire of the vendors to have users upgrading their phones or PCs en masse to get AI features has not happened. This is understandable, because the native AI features are somewhat limited, and the smarter features are too slow with a round-trip into the cloud.
What’s the killer app?
So far, there hasn’t really been one for consumers, other than using ChatGPT in the cloud. The possible killer AI app for smartphone usage could be a smart companion in your pocket that you talk to and ask questions. This requires:
- Conversational voice interface
- Super-smart LLM (i.e., big)
- Fast enough
That last point is the reason that we’re not there yet. The demos of conversational voice interfaces are online, but they’re not using phones. A trillion parameter model won’t fit on your phone, and trying to do a voice conversation with your phone sending requests into the cloud is just not going to be responsive enough.
Obstacles to Smartphone AI
Can an AI model run fast enough on your phone? There’s no shortage of research papers on getting AI engines to run their inference calculations directly on low-resource devices. The general class of research is about “edge” devices, and it isn’t just phones, but also even less powerful devices like IoT-capable network devices and security cameras processing images.
There are quite a few articles showing that you can run AI models on a smartphone. These started as enthusiast and experimentation type articles, but it’s now possible to use the frameworks of major vendors like Microsoft, Google, and Apple. However, these are using local models of a size about 1B or 2B, whereas ChatGPT 4 is almost two trillion parameters (that’s 1,000 times larger, if you like math). When you consider that phone models are using 4-bit quantized parameters, whereas cloud models use 32-bit floating-point, that’s another eightfold difference in capability.
But what about running a big LLM that’s actually smart? I’m not talking about having your phone talk to some anonymous server in the cloud to do its AI. Although there are already plenty of “AI apps” available to install on your phone, these are still mostly sending the requests over the network to an AI engine in the cloud.
Much of the early research that is relevant to fast phone execution of models relates to another type of computer, which you might know as a “car.” The need for computer vision models for automated or assisted driving has similar requirements to running on a phone, such as low latency and small storage. The general term is an “embedded” system or “real-time” system.
There are already small LLMs running on phones and PCs. However, there are some problems to running a big LLM on your phone. This limits us to smaller models, but everything is gradually getting more powerful. Running an AI model directly on your phone is problematic for several reasons:
- Too slow to run — response times are too long.
- Hardware acceleration — phones lack a GPU and have less CPU acceleration.
- Storage size — e.g., a “small” 3B model with 32-bit weights will need 12 Gigabytes of storage. With modern phones often over 512GB, storing even a 13B model in 52GB seems reasonable.
- Memory usage — an entire model is loaded into RAM for inference. The obstacle is more the time cost of accessing this memory than the storage size.
- Transmission size — install a huge model over your phone’s 4G or WiFi connection.
- Battery depletion — computations max out the phone’s CPU and chew cycles.
- Heat generation — water-cooled phones are not a thing.
For these reasons, phone AI is still somewhat limited in its capabilities, and it’s still faster to send complex AI requests off to a bigger server with lots of GPUs that’s running in the cloud, even though it’s a roundtrip network message. Over time some of the obstacles to natively-executing inference on phones will diminish:
- Better phone CPUs with hardware acceleration are already here (e.g., Apple Neural Engine since iPhone X, Qualcomm Snapdragon), with more on the way. Future phones will be much more AI-capable.
- Small model optimizations (e.g., multi-LoRA as used by Apple Intelligence).
- GPU phones will surely be coming to a store near you very soon.
- Phone storage sizes are also increasing and terabyte storage sizes will be the norm.
- 5G network connectivity will reduce concerns about transmission sizes.
- Data compression algorithms can lower transmission sizes, and also possibly storage sizes.
- Quantized models and other inference optimizations can improve speed and reduce storage size, giving reduced CPU usage, faster response times, lower storage size, and reduced transmission size (but with accuracy loss).
- Training and fine-tuning of models doesn’t need to happen on a phone (phew!).
But... you really need a “big” model, not a “small” model, if you want the app to be great with lots of happy users. And getting a big model running efficiently on a phone may take a while to come to fruition. In the meantime, your phone will be sending those types of queries up into the cloud.
Speeding Up Smartphone AI
Okay, so let’s say you want to run a “big” model on a “small” phone. Why? Lots of reasons, which we won’t explore here. So, you want what you want, which is to run the latest open source AI model on a phone.
First question is: do you even need to? Why not just use the AI engines in the cloud, and send requests back-and-forth between the phone and the cloud. Response time of modern networks is fast, message sizes are small, and users may not notice or even care. There are reasons beyond speed: privacy and security come to mind.
Another piece of good news: you don’t need to “build” the model on your phone. Those GPU-expensive tasks of training or fine-tuning can be done in the cloud. For native execution, the user only needs to run “inference” of the model on their phone.
Assuming you have your reasons to want to do this, let’s examine each of the obstacles for native phone execution of LLM model inference.
- Speed and response time. The AI engine on the phone needs fast “inference” (running the model quickly). And it probably cannot rely on a GPU, since there are already billions of phones out there without a GPU. Hardware acceleration in phone CPUs is limited. The main ways that models run without a GPU on a phone or PC is to use inference optimizations, of which the most popular at the moment is definitely quantization. Other supplemental techniques that might be needed include integer-only arithmetic and pruning (model compression). And there’s a whole host of lesser-known inference optimization techniques that might need to be combined together. For example, maybe the bottleneck of “auto-regression” will need to get bypassed so the AI engine can crank out multiple words at a time, without running the whole glob of a model for every single word.
- Network transmission size. Users need to download your 13B LLama-2 model to their phone? Uncompressed, it’s about 52GB. There’s already a lot known about compression algorithms (e.g., for video), and model files are just multi-gigabyte data files, so perhaps it can be compressed to a size that’s adequately small. But before we even use those network compression algorithms, the first thing to try is model compression, such as quantization. For example, using quantization to 8-bit would reduce the original 32-bit model size four-fold down to 13GB, for a slight loss in accuracy (probably acceptable). Binary quantization would reduce it by a factor of 32, but then the inference accuracy goes south. 5G bandwidth will help a lot, but remember there’s a lot of users (billions) out there with non-5G compatible phones. Model compression techniques such as quantization and pruning can also reduce the total size. But the whole model is required. There’s no such thing as half an AI model. And you can’t stream an AI model so it starts running before it’s all loaded (although that’s actually an interesting research question as to whether it might be possible).
- Storage size. The whole model needs to be permanently stored on the device. Maybe it can be stored in some compressed form. The same comments about model compression techniques apply. It can either be stored uncompressed if the phone has a bigger storage space, or perhaps it can be stored in compressed form, and only uncompressed when it’s needed. But it’ll be needed all the time, because, well, it’s AI you know, so everybody needs it for everything.
- Memory size. The inference algorithm needs the whole model, uncompressed, available to use in RAM. Not all at the same time, but it will definitely need to swap the entire model (uncompressed) in and out of memory to process all those model weights. For each word. That’s a fair chunk of RAM (e.g., 52GB) but the bottleneck is also the processing cost from swapping data in/out. And that occurs for every word it generates. Again, model compression seems key to cut down the original 52GB size of the model (e.g., 8-bit quantization cuts it to 13GB).
- Battery depletion and heat generation. A model with 13B weights needs to do 13 billion multiplications for every word it outputs. That’s a lot of power usage and reducing resource utilization means using the above-mentioned optimizations of the inference algorithm (e.g., quantization, pruning, non-auto-regression, etc.).
It might not even be possible to realistically run large LLMs natively on today’s phones. But solving any of the above-mentioned problems is certainly valuable standalone, in that it will reduce the cost of running AI models on GPUs in server farms that are growing in the cloud, and maybe even make it possible to run large LLMs natively on desktop PCs.
References
- Mark Gurman, June 11, 2024, Apple’s Push to Infuse Devices With AI Will Take Years to Pay Off, https://www.bloomberg.com/news/newsletters/2024-06-11/will-apple-intelligence-features-boost-iphone-sales-it-may-take-years
- Ben Evans, June 20, 2024, Apple intelligence and AI maximalism, https://www.ben-evans.com/benedictevans/2024/06/20/apple-intelligence
- Lucas Mearian, 24 Oct 2024, 2025: The year of the AI PC, Computer World, https://www.computerworld.com/article/3583355/2025-the-year-of-the-ai-pc.html
- Amos Gyamfi, Aug 28, 2024, The 6 Best LLM Tools To Run Models Locally, https://medium.com/@amosgyamfi/the-6-best-llm-tools-to-run-models-locally-eedd0f7c2bbd
- Michael Nuñez, September 13, 2024, Microsoft’s Windows Agent Arena: Teaching AI assistants to navigate your PC, https://venturebeat.com/ai/microsofts-windows-agent-arena-teaching-ai-assistants-to-navigate-your-pc/
- Vince Lam, Mar 12, 2024, 50+ Open-Source Options for Running LLMs Locally, https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f
- Jason Perlow, Aug. 6, 2024, How to run dozens of AI models on your Mac or PC - no third-party cloud needed, https://www.zdnet.com/article/how-to-run-dozens-of-ai-models-on-your-mac-or-pc-no-third-party-cloud-needed/
- Kif Leswing, Fri, Oct 4 2024, As Apple enters AI race, iPhone maker turns to its army of developers for an edge, https://www.cnbc.com/2024/10/04/apple-is-turning-to-its-army-of-developers-for-an-edge-in-the-ai-race.html
- Clare Duffy, September 30, 2024, The iPhone 16 isn’t selling as well as Apple may have hoped, https://edition.cnn.com/2024/09/30/tech/iphone-16-presales-apple-intelligence/index.html
- Steve Kovach, Sep 5 2024, AI gadgets have been a bust so far. Apple aims to change that, https://www.cnbc.com/2024/09/05/ai-gadgets-have-been-a-bust-so-far-apple-aims-to-change-that.html
- Apple, Sep 2024, Apple Intelligence comes to iPhone, iPad, and Mac starting next month, https://www.apple.com/newsroom/2024/09/apple-intelligence-comes-to-iphone-ipad-and-mac-starting-next-month/
- Google, March 30, 2024 (accessed), Get started with Gemini Nano on Android (on-device), https://ai.google.dev/tutorials/android_aicore
- Chris Velazco, February 21, 2024, Phones are getting packed with AI features. But how helpful are they? https://www.washingtonpost.com/technology/2024/02/21/ai-phones-google-samsung-iphone/
- Sandeep Budki, March 20, 2024, Samsung Galaxy S24 Ultra Review: Committed and Spicing up Relationship with Customers, https://www.themobileindian.com/reviews/samsung-galaxy-s24-ultra-review-committed-and-spicing-up-relationship-with-customers
- Arjun Kharpal, July 25, 2024, Samsung hints at new products as it bets on AI to drive upgrades to its latest foldable phones, https://www.cnbc.com/2024/07/26/samsung-tm-roh-interview-galaxy-ai-mixed-reality-and-foldables.html
- Allison Johnson, Aug 1, 2024, A first look at Apple Intelligence and its (slightly) smarter Siri, The Verge, https://www.theverge.com/2024/7/31/24209910/apple-intelligence-ios-18-preview-siri
- Kif Leswing, Aug 14 2024, Google’s live demo of Gemini ramps up pressure on Apple as AI reaches smartphone users, https://www.cnbc.com/2024/08/14/google-live-gemini-demo-lifts-pressure-on-apple-as-ai-hits-smartphones.html
|
• Online: Table of Contents • PDF: Free PDF book download • Buy: Generative AI Applications: Planning, Design and Implementation |
|
The new Generative AI Applications book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI Applications |