When you think about tech companies and their role in advancing machine learning (ML) and artificial intelligence is Apple on your list? Let’s be honest, the biggest and splashiest research in the last eighteen months has come from companies like OpenAI (the GPT family of models, DALL-E 2), the Google Brain Team (Bard, Imagen), and Meta (SAM, LLaMA). But after watching Apple’s WWDC 2023 Keynote and thinking about how Apple applies ML in their software I’ve decided they deserve a spot on the innovators list as well. Let’s go over some of the announcements from the keynote and I will explain my reasoning. FYI, there’s a lot more to it than the reveal of the Vision Pro.
New Mac Hardware
Apple has always had strong ties to the liberal arts and their new lineup of hardware reflects that continued relationship. New models of Mac Studio and Mac Pro were announced that can be configured with Apples latest M2 Max and M2 Ultra chips. Clearly these machines are being marketed primarily to companies in the media industry. They said as much as they went over performance enhancements in video editing software like Adobe After Effects and dropped names of high profile customers like NBC’s Saturday Night Live.
But there was also a brief moment where Apple boasted about the new M2 Ultra chip and applications in ML.
For those not entrenched in the latest GPU specs, NVIDIA’s top of the line consumer GPU, the RTX 4090, has 24GB of VRAM. When put along side a maxed out M2 Ultra with 192GB of unified memory (shared across all compute tasks) it’s clear you can train larger models with these new machines, but was Apple trying to dangle a carrot to ML practitioners?
There is a lot to unpack here. Just three years ago Apple was still relying on Intel for their CPUs and AMD for their GPUs. I had contemplated using my iMac and an external GPU for ML at that time, but with little software support I ended up building a custom Ubuntu rig instead. However, the technological landscape has changed dramatically since that time. Metal, Apple’s graphics library, has been incorporated into Pytorch and Tensorflow, the two most popular deep learning frameworks, adding GPU accelerated training to the Mac. Apple has also completely transitioned to their own silicon and the latest M2 Ultra chip can be configured with a 24-core CPU, 76-core GPU, and 32-core Neural Engine. So the real question is how does the M2 Ultra stack up against NVIDIA’s GPUs and can I get real ML work done on a Mac?
To answer these questions what we really need are benchmarks, and it turns out, we have some! The folks over at Weights and Biases have been tracking the performance of Mac silicon for a while now. Here is a chart comparing speeds when training a Resnet50 model (23.5M parameters) on the Oxford Pets dataset.
The M1 Ultra 64 GPU core system, Apple’s fastest in this benchmark, is about 3x slower than NVIDIA’s RTX 4090, and approaching the same speed as an RTX 3050 Ti laptop GPU. Before you get too disappointed let me point out a few things. The Ultra series of chips are actually two chips stuck together, so the M1 Ultra 64 is two M1 Max chips and the new M2 Ultra will actually be two M2 Max chips. Using the data from our benchmarks I calculated a 1.7x speed-up going from the M1 Max to M1 Ultra. If we see the same performance improvement going from the M2 Max to the new M2 Ultra, then I think we can expect the M2 Ultra to sit nicely between the Tesla T4 and the GTX 1080Ti. Now we are getting somewhere. In addition, Apple’s silicon is designed to be extremely efficient and uses only a fraction of the wattage usually drawn by NVIDIA GPUs.
To finish the hardware part of the discussion let’s circle back to the quote from the keynote. Unified memory will indeed allow you to train large models on Apple’s silicon as suggested in the keynote, but the benchmark data shows we should be realistic about how long it might take on a Mac. I think most folks are going to stick with NVIDIA for ML workloads, but fine tuning and tinkering with smaller models are definitely things you can do on this new hardware. I believe Apple’s true interest in ML is less about fast hardware and more about applications in software, which we will go over next.
New Software
The keynote this year was literally chalked full of technology utilizing machine learning. I had trouble keeping track so I made a list:
realtime transcription of incoming voice mail in the Phone App
transcription of audio messages in the Messages App
pet recognition added to Photos App
keyboard autocorrection is now powered by a transformer model…what the duck
dictation also uses a new transformer model
ML powered inspiration for writing entries in the new Journal App
automatic textfield identification in PDFs
presenter overlay in video conferencing
adaptive noise canceling and conversation awareness on AirPods
AirPlay learned preferences makes hardware suggestions about device pairing
Smartstack on WatchOS uses ML to show relevant info when you need it
As you can see the list is fairly long, but only covers what was mentioned in new features for this year. ML has become heavily incorporated across Apple’s software. It’s not just sprinkled in a few applications anymore. What’s more, although Apple is much less prolific in terms of academic contributions, they have been busy innovating in their own way. Apple’s ML models run locally on-device where energy efficiency, size, and speed are paramount. And of course we, as end users, also want accuracy. It’s really hard to get all of these things at once and there are always trade-offs.
Direct evidence of innovation, aside from a phone that can do cool things, is hard to come by. Apple tends to keep a lot of their tech and code private, but for those that are interested there is an area on their website highlighting publications and advancements in ML. One of their posts titled On-device Panoptic Segmentation for Camera Using Transformers is a perfect example.
Here, they developed a model that can separate elements from a scene (people, sky, etc.) along with subcomponents such as skin and hair. To paraphrase the article, their technique is fast enough to run in realtime, compact enough to run on mobile, and has a minimal impact on battery life. Work like this isn’t drawing headlines, but it’s being used in millions of devices.
Apple VisionPro
And now we get to the elephant wearing VR goggles in the room. If you watched the keynote I think you will agree that the VisionPro looks like something from a sci-fi movie. The interface uses your voice, eyes, and hands to control applications that appear to hover in front of your furniture. It looks amazing. Clearly ML is being used everywhere. Gestures, pose, and voice in the UI are powered by ML. Detecting people entering your field of vision requires ML, and reconstructing your face so you don’t look like the guys from Daft Punk during FaceTime calls requires ML.
Are the underlying ML models innovative? In terms of the basic building blocks probably not. But as I mentioned earlier, getting models that run in realtime on a device with resource and power constraints is.
Summary
It wasn’t the palpable notes of ‘machine learning’ and ‘neural networks’ sprinkled in with the announcements, or the reveal of the Vision Pro that put Apple on the ML innovators list for me today. It was the realization that after years of quietly investing, developing, and experimenting, Apple has become a leader in deployment of ML across their software and hardware ecosystem.